net: stmmac: configure PTP clock source prior to PTP initialization
For Intel platform, it is required to configure PTP clock source prior PTP
initialization in MAC. So, need to move ptp_clk_freq_config execution from
stmmac_ptp_register() to stmmac_init_ptp().
Fixes: 76da35dc99af ("stmmac: intel: Add PSE and PCH PTP clock source selection") Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit breaks Linux compatibility with USGv6 tests. The RFC this
commit was based on is actually an expired draft: no published RFC
currently allows the new behaviour it introduced.
Without full IETF endorsement, the flash renumbering scenario this
patch was supposed to enable is never going to work, as other IPv6
equipements on the same LAN will keep the 2 hours limit.
Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series introduces a helper function task_is_in_init_pid_ns()
to replace open code. The two patches are extracted from the original
series [1] for network subsystem.
As a plan, we can firstly land this patch set into kernel 5.18; there
have 5 patches are left out from original series [1], as a next step,
I will resend them for appropriate linux-next merging.
Leo Yan [Wed, 26 Jan 2022 05:04:26 +0000 (13:04 +0800)]
pid: Introduce helper task_is_in_init_pid_ns()
Currently the kernel uses open code in multiple places to check if a
task is in the root PID namespace with the kind of format:
if (task_active_pid_ns(current) == &init_pid_ns)
do_something();
This patch creates a new helper function, task_is_in_init_pid_ns(), it
returns true if a passed task is in the root PID namespace, otherwise
returns false. So it will be used to replace open codes.
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use GFP_ATOMIC when allocating pages out of the hotpath,
continue to use GFP_KERNEL when allocating pages during setup.
GFP_KERNEL will allow blocking which allows it to succeed
more often in a low memory enviornment but in the hotpath we do
not want to allow the allocation to block.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Link: https://lore.kernel.org/r/20220126003843.3584521-1-awogbemila@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Wed, 26 Jan 2022 15:45:49 +0000 (15:45 +0000)]
Merge branch 'lan966x-fixes'
Horatiu Vultur says:
====================
net: lan966x: Fixes for sleep in atomic context
This patch series contains 2 fixes for lan966x that is sleeping in atomic
context. The first patch fixes the injection of the frames while the second
one fixes the updating of the MAC table.
v1->v2:
- correct the fix tag in the second patch, it was using the wrong sha.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Tue, 25 Jan 2022 11:48:16 +0000 (12:48 +0100)]
net: lan966x: Fix sleep in atomic context when updating MAC table
The function lan966x_mac_wait_for_completion is used to poll the status
of the MAC table using the function readx_poll_timeout. The problem with
this function is that is called also from atomic context. Therefore
update the function to use readx_poll_timeout_atomic.
Fixes: e18aba8941b40b ("net: lan966x: add mactable support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Horatiu Vultur [Tue, 25 Jan 2022 11:48:15 +0000 (12:48 +0100)]
net: lan966x: Fix sleep in atomic context when injecting frames
On lan966x, when injecting a frame it was polling the register
QS_INJ_STATUS to see if it can continue with the injection of the frame.
The problem was that it was using readx_poll_timeout which could sleep
in atomic context.
This patch fixes this issue by using readx_poll_timeout_atomic.
Fixes: d28d6d2e37d10d ("net: lan966x: add port module support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Jan 2022 15:40:02 +0000 (15:40 +0000)]
Merge branch 'dev_addr-const-fixes'
Jakub Kicinski says:
====================
ethernet: fix some esoteric drivers after netdev->dev_addr constification
Looking at recent fixes for drivers which don't get included with
allmodconfig builds I thought it's worth grepping for more instances of:
dev->dev_addr\[.*\] =
This set contains the fixes.
v2: add last 3 patches which fix drivers for the RiscPC ARM platform.
Thanks to Arnd Bergmann for explaining how to build test that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:38:01 +0000 (16:38 -0800)]
ethernet: seeq/ether3: don't write directly to netdev->dev_addr
netdev->dev_addr is const now.
Compile tested rpc_defconfig w/ GCC 8.5.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:38:00 +0000 (16:38 -0800)]
ethernet: 8390/etherh: don't write directly to netdev->dev_addr
netdev->dev_addr is const now.
Compile tested rpc_defconfig w/ GCC 8.5.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:37:59 +0000 (16:37 -0800)]
ethernet: i825xx: don't write directly to netdev->dev_addr
netdev->dev_addr is const now.
Compile tested rpc_defconfig w/ GCC 8.5.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:37:58 +0000 (16:37 -0800)]
ethernet: broadcom/sb1250-mac: don't write directly to netdev->dev_addr
netdev->dev_addr is const now.
Compile tested bigsur_defconfig and sb1250_swarm_defconfig.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:37:57 +0000 (16:37 -0800)]
ethernet: tundra: don't write directly to netdev->dev_addr
netdev->dev_addr is const now.
Maintain the questionable offsetting in ndo_set_mac_address.
Compile tested holly_defconfig and mpc7448_hpc2_defconfig.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 26 Jan 2022 00:37:56 +0000 (16:37 -0800)]
ethernet: 3com/typhoon: don't write directly to netdev->dev_addr
This driver casts off the const and writes directly to netdev->dev_addr.
This will result in a MAC address tree corruption and a warning.
Compile tested ppc6xx_defconfig.
Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_htb: Fail on unsupported parameters when offload is requested
The current implementation of HTB offload doesn't support some
parameters. Instead of ignoring them, actively return the EINVAL error
when they are set to non-defaults.
As this patch goes to stable, the driver API is not changed here. If
future drivers support more offload parameters, the checks can be moved
to the driver side.
Note that the buffer and cbuffer parameters are also not supported, but
the tc userspace tool assigns some default values derived from rate and
ceil, and identifying these defaults in sch_htb would be unreliable, so
they are still ignored.
Yufeng Mo [Tue, 25 Jan 2022 07:03:12 +0000 (15:03 +0800)]
net: hns3: handle empty unknown interrupt for VF
Since some interrupt states may be cleared by hardware, the driver
may receive an empty interrupt. Currently, the VF driver directly
disables the vector0 interrupt in this case. As a result, the VF
is unavailable. Therefore, the vector0 interrupt should be enabled
in this case.
Fixes: b90fcc5bd904 ("net: hns3: add reset handling for VF when doing Core/Global/IMP reset") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 24 Jan 2022 17:22:49 +0000 (09:22 -0800)]
net: fec_mpc52xx: don't discard const from netdev->dev_addr
Recent changes made netdev->dev_addr const, and it's passed
directly to mpc52xx_fec_set_paddr().
Similar problem exists on the probe patch, the driver needs
to call eth_hw_addr_set().
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: adeef3e32146 ("net: constify netdev->dev_addr") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The cpsw driver didn't properly initialise the struct page_pool_params
before calling page_pool_create(), which leads to crashes after the struct
has been expanded with new parameters.
The second Fixes tag below is where the buggy code was introduced, but
because the code was moved around this patch will only apply on top of the
commit in the first Fixes tag.
Fixes: c5013ac1dd0e ("net: ethernet: ti: cpsw: move set of common functions in cpsw_priv") Fixes: 9ed4050c0d75 ("net: ethernet: ti: cpsw: add XDP support") Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jisheng Zhang [Sun, 23 Jan 2022 15:54:58 +0000 (23:54 +0800)]
net: stmmac: reduce unnecessary wakeups from eee sw timer
Currently, on EEE capable platforms, if EEE SW timer is used, the SW
timer cause 1 wakeup/s even if the TX has successfully entered EEE.
Remove this unnecessary wakeup by only calling mod_timer() if we
haven't successfully entered EEE.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 24 Jan 2022 20:17:58 +0000 (12:17 -0800)]
Merge tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-01-24
The first patch updates the email address of Brian Silverman from his
former employer to his private address.
The next patch fixes DT bindings information for the tcan4x5x SPI CAN
driver.
The following patch targets the m_can driver and fixes the
introduction of FIFO bulk read support.
Another patch for the tcan4x5x driver, which fixes the max register
value for the regmap config.
The last patch for the flexcan driver marks the RX mailbox support for
the MCF5441X as support.
* tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: flexcan: mark RX via mailboxes as supported on MCF5441X
can: tcan4x5x: regmap: fix max register value
can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
mailmap: update email address of Brian Silverman
====================
can: flexcan: mark RX via mailboxes as supported on MCF5441X
Most flexcan IP cores support 2 RX modes:
- FIFO
- mailbox
The flexcan IP core on the MCF5441X cannot receive CAN RTR messages
via mailboxes. However the mailbox mode is more performant. The commit
| 1c45f5778a3b ("can: flexcan: add ethtool support to change rx-rtr setting during runtime")
added support to switch from FIFO to mailbox mode on these cores.
After testing the mailbox mode on the MCF5441X by Angelo Dureghello,
this patch marks it (without RTR capability) as supported. Further the
IP core overview table is updated, that RTR reception via mailboxes is
not supported.
The MRAM of the tcan4x5x has a size of 2K and starts at 0x8000. There
are no further registers in the tcan4x5x making 0x87fc the biggest
addressable register.
This patch fixes the max register value of the regmap config from
0x8ffc to 0x87fc.
can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
In order to optimize FIFO access, especially on m_can cores attached
to slow busses like SPI, in patch
| e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors")
bulk read/write support has been added to the m_can_fifo_{read,write}
functions.
That change leads to the tcan driver to call
regmap_bulk_{read,write}() with a length of 0 (for CAN frames with 0
data length). regmap treats this as an error:
| tcan4x5x spi1.0 tcan4x5x0: FIFO write returned -22
This patch fixes the problem by not calling the
cdev->ops->{read,write)_fifo() in case of a 0 length read/write.
Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors") Link: https://lore.kernel.org/all/20220114155751.2651888-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Cc: Matt Kline <matt@bitbashing.io> Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com> Reported-by: Michael Anochin <anochin@photo-meter.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
net: atlantic: Use the bitmap API instead of hand-writing it
Simplify code by using bitmap_weight() and bitmap_zero() instead of
hand-writing these functions.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 22 Jan 2022 11:40:56 +0000 (06:40 -0500)]
ping: fix the sk_bound_dev_if match in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
the selftests 'router_broadcast.sh' will fail, as such command
# ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
can't receive the response skb by the PING socket. It's caused by mismatch
of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
as dif is vrf-h1 if dif's master was set to vrf-h1.
This patch is to fix this regression by also checking the sk_bound_dev_if
against sdif so that the packets can stil be received even if the socket
is not bound to the vrf device but to the real iif.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tries to fix it by holding clcsock_release_lock and
checking whether clcsock has already been released before access.
In case that a crash of the same reason happens in smc_getsockopt()
or smc_switch_to_fallback(), this patch also checkes smc->clcsock
in them too. And the caller of smc_switch_to_fallback() will identify
whether fallback succeeds according to the return value.
With previous bug fix, ->wait_capability flag is no longer needed and can
be removed.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic_tasklet() continuously spins waiting for responses to all
capability requests. It does this to avoid encountering an error
during initialization of the vnic. However if there is a bug in the
VIOS and we do not receive a response to one or more queries the
tasklet ends up spinning continuously leading to hard lock ups.
If we fail to receive a message from the VIOS it is reasonable to
timeout the login attempt rather than spin indefinitely in the tasklet.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should
send out the next protocol message type. i.e when we get back responses
to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs.
Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we
send out the QUERY_IP_OFFLOAD CRQ.
We currently increment ->running_cap_crqs as we send out each CRQ and
have the ibmvnic_tasklet() send out the next message type, when this
running_cap_crqs count drops to 0.
This assumes that all the CRQs of the current type were sent out before
the count drops to 0. However it is possible that we send out say 6 CRQs,
get preempted and receive all the 6 responses before we send out the
remaining CRQs. This can result in ->running_cap_crqs count dropping to
zero before all messages of the current type were sent and we end up
sending the next protocol message too early.
Instead initialize the ->running_cap_crqs upfront so the tasklet will
only send the next protocol message after all responses are received.
Use the cap_reqs local variable to also detect any discrepancy (either
now or in future) in the number of capability requests we actually send.
Currently only send_query_cap() is affected by this behavior (of sending
next message early) since it is called from the worker thread (during
reset) and from application thread (during ->ndo_open()) and they can be
preempted. send_request_cap() is only called from the tasklet which
processes CRQ responses sequentially, is not be affected. But to
maintain the existing symmtery with send_query_capability() we update
send_request_capability() also.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If auto-priority-failover (APF) is enabled and there are at least two
backing devices of different priorities, some resets like fail-over,
change-param etc can cause at least two back to back failovers. (Failover
from high priority backing device to lower priority one and then back
to the higher priority one if that is still functional).
Depending on the timimg of the two failovers it is possible to trigger
a "hard" reset and for the hard reset to fail due to failovers. When this
occurs, the driver assumes that the network is unstable and disables the
VNIC for a 60-second "settling time". This in turn can cause the ethtool
command to fail with "No such device" while the vnic automatically recovers
a little while later.
Given that it's possible to have two back to back failures, allow for extra
failures before disabling the vnic for the settling time.
Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 22 Jan 2022 00:57:31 +0000 (16:57 -0800)]
ipv4: fix ip option filtering for locally generated fragments
During IP fragmentation we sanitize IP options. This means overwriting
options which should not be copied with NOPs. Only the first fragment
has the original, full options.
ip_fraglist_prepare() copies the IP header and options from previous
fragment to the next one. Commit 19c3401a917b ("net: ipv4: place control
buffer handling away from fragmentation iterators") moved sanitizing
options before ip_fraglist_prepare() which means options are sanitized
and then overwritten again with the old values.
Fixing this is not enough, however, nor did the sanitization work
prior to aforementioned commit.
ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen
for the length of the options. ipcb->opt of fragments is not populated
(it's 0), only the head skb has the state properly built. So even when
called at the right time ip_options_fragment() does nothing. This seems
to date back all the way to v2.5.44 when the fast path for pre-fragmented
skbs had been introduced. Prior to that ip_options_build() would have been
called for every fragment (in fact ever since v2.5.44 the fragmentation
handing in ip_options_build() has been dead code, I'll clean it up in
-next).
In the original patch (see Link) caixf mentions fixing the handling
for fragments other than the second one, but I'm not sure how _any_
fragment could have had their options sanitized with the code
as it stood.
Tested with python (MTU on lo lowered to 1000 to force fragmentation):
IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost.36500 > localhost.search-agent: UDP, length 2000
IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost > localhost: udp
IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost > localhost: udp
After:
IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960
IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256))
localhost > localhost: udp
IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256))
localhost > localhost: udp
RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".
Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/ Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: caixf <ooppublic@163.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jianguo Wu [Fri, 21 Jan 2022 09:15:31 +0000 (17:15 +0800)]
net-procfs: show net devices bound packet types
After commit:7866a621043f ("dev: add per net_device packet type chains"),
we can not get packet types that are bound to a specified net device by
/proc/net/ptype, this patch fix the regression.
Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:
Before:
[root@localhost ~]# cat /proc/net/ptype
Type Device Function
0800 ip_rcv
0806 arp_rcv
86dd ipv6_rcv
After:
[root@localhost ~]# cat /proc/net/ptype
Type Device Function
ALL ens192 tpacket_rcv
0800 ip_rcv
0806 arp_rcv
86dd ipv6_rcv
v1 -> v2:
- fix the regression rather than adding new /proc API as
suggested by Stephen Hemminger.
Fixes: 7866a621043f ("dev: add per net_device packet type chains") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 21 Jan 2022 08:25:18 +0000 (16:25 +0800)]
bonding: use rcu_dereference_rtnl when get bonding active slave
bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
this function in rtnl protected context.
With this update, we can rmeove the rcu_read_lock/unlock in
bonding .ndo_eth_ioctl and .get_ts_info.
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Wed, 19 Jan 2022 16:44:55 +0000 (17:44 +0100)]
net: sfp: ignore disabled SFP node
Commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices
and sfp cages") added code which finds SFP bus DT node even if the node
is disabled with status = "disabled". Because of this, when phylink is
created, it ends with non-null .sfp_bus member, even though the SFP
module is not probed (because the node is disabled).
We need to ignore disabled SFP bus node.
Fixes: ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") Signed-off-by: Marek Behún <kabel@kernel.org> Cc: stable@vger.kernel.org # 2203cbf2c8b5 ("net: sfp: move fwnode parsing into sfp-bus layer") Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Iurman [Fri, 21 Jan 2022 17:34:49 +0000 (18:34 +0100)]
selftests: net: ioam: expect support for Queue depth data
The IOAM queue-depth data field was added a few weeks ago,
but the test unit was not updated accordingly.
Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: b63c5478e9cb ("ipv6: ioam: Support for Queue depth data field") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://lore.kernel.org/r/20220121173449.26918-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Fri, 21 Jan 2022 07:39:35 +0000 (23:39 -0800)]
mptcp: Use struct_group() to avoid cross-field memset()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.
Use struct_group() to capture the fields to be reset, so that memset()
can be appropriately bounds-checked by the compiler.
Cc: Matthieu Baerts <matthieu.baerts@tessares.net> Cc: mptcp@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20220121073935.1154263-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Howells [Fri, 21 Jan 2022 23:12:58 +0000 (23:12 +0000)]
rxrpc: Adjust retransmission backoff
Improve retransmission backoff by only backing off when we retransmit data
packets rather than when we set the lost ack timer.
To this end:
(1) In rxrpc_resend(), use rxrpc_get_rto_backoff() when setting the
retransmission timer and only tell it that we are retransmitting if we
actually have things to retransmit.
Note that it's possible for the retransmission algorithm to race with
the processing of a received ACK, so we may see no packets needing
retransmission.
(2) In rxrpc_send_data_packet(), don't bump the backoff when setting the
ack_lost_at timer, as it may then get bumped twice.
With this, when looking at one particular packet, the retransmission
intervals were seen to be 1.5ms, 2ms, 3ms, 5ms, 9ms, 17ms, 33ms, 71ms,
136ms, 264ms, 544ms, 1.088s, 2.1s, 4.2s and 8.3s.
Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Suggested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/164138117069.2023386.17446904856843997127.stgit@warthog.procyon.org.uk/ Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Jan 2022 14:32:21 +0000 (14:32 +0000)]
Merge branch 'octeontx2-af-fixes'
Subbaraya Sundeep says:
====================
octeontx-af2: Fixes for CN10K and CN9xxx platforms
This patchset has consolidated fixes in Octeontx2 driver
handling CN10K and CN9xxx platforms. When testing the
new CN10K hardware some issues resurfaced like accessing
wrong register for CN10K and enabling loopback on not supported
interfaces. Some fixes are needed for CN9xxx platforms as well.
Below is the description of patches
Patch 1: AF sets RX RSS action for all the VFs when a VF is
brought up. But when a PF sets RX action for its VF like Drop/Direct
to a queue in ntuple filter it is not retained because of AF fixup.
This patch skips modifying VF RX RSS action if PF has already
set its action.
Patch 2: When configuring backpressure wrong register is being read for
LBKs hence fixed it.
Patch 3: Some RVU blocks may take longer time to reset but are guaranteed
to complete the reset. Hence wait till reset is complete.
Patch 4: For enabling LMAC CN10K needs another register compared
to CN9xxx platforms. Hence changed it.
Patch 5: Adds missing barrier before submitting memory pointer
to the aura hardware.
Patch 6: Increase polling time while link credit restore and also
return proper error code when timeout occurs.
Patch 7: Internal loopback not supported on LPCS interfaces like
SGMII/QSGMII so do not enable it.
Patch 8: When there is a error in message processing, AF sets the error
response and replies back to requestor. PF forwards a invalid message to
VF back if AF reply has error in it. This way VF lacks the actual error set
by AF for its message. This is changed such that PF simply forwards the
actual reply and let VF handle the error.
Patch 9: ntuple filter with "flow-type ether proto 0x8842 vlan 0x92e"
was not working since ethertype 0x8842 is NGIO protocol. Hardware
parser explicitly parses such NGIO packets and sets the packet as
NGIO and do not set it as tagged packet. Fix this by changing parser
such that it sets the packet as both NGIO and tagged by using
separate layer types.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kiran Kumar K [Fri, 21 Jan 2022 06:34:47 +0000 (12:04 +0530)]
octeontx2-af: Add KPU changes to parse NGIO as separate layer
With current KPU profile NGIO is being parsed along with CTAG as
a single layer. Because of this MCAM/ntuple rules installed with
ethertype as 0x8842 are not being hit. Adding KPU profile changes
to parse NGIO in separate ltype and CTAG in separate ltype.
Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes") Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PF forwards its VF messages to AF and corresponding
replies from AF to VF. AF sets proper error code in the
replies after processing message requests. Currently PF
checks the error codes in replies and sends invalid
message to VF. This way VF lacks the information of
error code set by AF for its messages. This patch
changes that such that PF simply forwards AF replies
so that VF can handle error codes.
Fixes: d424b6c02415 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Fri, 21 Jan 2022 06:34:44 +0000 (12:04 +0530)]
octeontx2-af: Increase link credit restore polling timeout
It's been observed that sometimes link credit restore takes
a lot of time than the current timeout. This patch increases
the default timeout value and return the proper error value
on failure.
Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Fri, 21 Jan 2022 06:34:43 +0000 (12:04 +0530)]
octeontx2-pf: cn10k: Ensure valid pointers are freed to aura
While freeing SQB pointers to aura, driver first memcpy to
target address and then triggers lmtst operation to free pointer
to the aura. We need to ensure(by adding dmb barrier)that memcpy
is finished before pointers are freed to the aura. This patch also
adds the missing sq context structure entry in debugfs.
Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Fri, 21 Jan 2022 06:34:42 +0000 (12:04 +0530)]
octeontx2-af: cn10k: Use appropriate register for LMAC enable
CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG
register for lmac TX/RX enable whereas CN9xxx platforms use
CGX_CMRX_CONFIG register. This config change was missed when
adding support for CN10K RPM.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Fri, 21 Jan 2022 06:34:40 +0000 (12:04 +0530)]
octeontx2-af: Fix LBK backpressure id count
In rvu_nix_get_bpid() lbk_bpid_cnt is being read from
wrong register. Due to this backpressure enable is failing
for LBK VF32 onwards. This patch fixes that.
Fixes: fe1939bb2340 ("octeontx2-af: Add SDP interface support") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
AF modifies all the rules destined for VF to use
the action same as default RSS action. This fixup
was needed because AF only installs default rules with
RSS action. But the action in rules installed by a PF
for its VFs should not be changed by this fixup.
This is because action can be drop or direct to
queue as specified by user(ntuple filters).
This patch fixes that problem.
Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 21 Jan 2022 00:35:29 +0000 (16:35 -0800)]
selftests: mptcp: fix ipv6 routing setup
MPJ ipv6 selftests currently lack per link route to the server
net. Additionally, ipv6 subflows endpoints are created without any
interface specified. The end-result is that in ipv6 self-tests
subflows are created all on the same link, leading to expected delays
and sporadic self-tests failures.
Fix the issue by adding the missing setup bits.
Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases") Reported-and-tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Geliang Tang [Fri, 21 Jan 2022 00:35:28 +0000 (16:35 -0800)]
mptcp: fix removing ids bitmap setting
In mptcp_pm_nl_rm_addr_or_subflow(), the bit of rm_list->ids[i] in the
id_avail_bitmap should be set, not rm_list->ids[1]. This patch fixed it.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 21 Jan 2022 00:35:27 +0000 (16:35 -0800)]
mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
The MPTCP endpoint list is under RCU protection, guarded by the
pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list
without acquiring the spin-lock nor under the RCU critical section.
This change addresses the issue performing the lookup and the endpoint
update under the pernet spinlock.
Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
The Fixes tag I chose is probably arbitrary, I do not think
we need to backport this patch to older kernels.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Thu, 20 Jan 2022 12:34:40 +0000 (14:34 +0200)]
tcp: Add a stub for sk_defer_free_flush()
When compiling the kernel with CONFIG_INET disabled, the
sk_defer_free_flush() should be defined as a nop.
This resolves the following compilation error:
ld: net/core/sock.o: in function `sk_defer_free_flush':
./include/net/tcp.h:1378: undefined reference to `__sk_defer_free_flush'
Fixes: 79074a72d335 ("net: Flush deferred skb free on socket destroy") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220120123440.9088-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Thu, 9 Dec 2021 01:56:33 +0000 (17:56 -0800)]
i40e: fix unsigned stat widths
Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match
the width of the stats counters in struct i40e_rx_queue_stats.
Update debugfs code to use the correct format specifier for u64.
Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Joe Damato <jdamato@fastly.com> Reported-by: kernel test robot <lkp@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Karen Sornek [Thu, 2 Dec 2021 11:52:01 +0000 (12:52 +0100)]
i40e: Fix for failed to init adminq while VF reset
Fix for failed to init adminq: -53 while VF is resetting via MAC
address changing procedure.
Added sync module to avoid reading deadbeef value in reinit adminq
during software reset.
Without this patch it is possible to trigger VF reset procedure
during reinit adminq. This resulted in an incorrect reading of
value from the AQP registers and generated the -53 error.
Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This was caused by PF queue pile fragmentation due to
flow director VSI queue being placed right after main VSI.
Because of this main VSI was not able to resize its
queue allocation for XDP resulting in no queues allocated
for main VSI when XDP was turned on.
Fix this by always allocating last queue in PF queue pile
for a flow director VSI.
Fixes: 41c445ff0f48 ("i40e: main driver core") Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Before this patch VF interface vanished when
maximum queue number was exceeded. Driver tried
to add next queues even if there was not enough
space. PF sent incorrect number of queues to
the VF when there were not enough of them.
Add an additional condition introduced to check
available space in 'qp_pile' before proceeding.
This condition makes it impossible to add queues
if they number is greater than the number resulting
from available space.
Also add the search for free space in PF queue
pair piles.
Without this patch VF interfaces are not seen
when available space for queues has been
exceeded and following logs appears permanently
in dmesg:
"Unable to get VF config (-32)".
"VF 62 failed opcode 3, retval: -5"
"Unable to get VF config due to PF error condition, not retrying"
Fixes: 7daa6bf3294e ("i40e: driver core headers") Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e: Increase delay to 1 s after global EMP reset
Recently simplified i40e_rebuild causes that FW sometimes
is not ready after NVM update, the ping does not return.
Increase the delay in case of EMP reset.
Old delay of 300 ms was introduced for specific cards for 710 series.
Now it works for all the cards and delay was increased.
Fixes: 1fa51a650e1d ("i40e: Add delay after EMP reset for firmware to recover") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
David S. Miller [Thu, 20 Jan 2022 11:58:45 +0000 (11:58 +0000)]
Merge branch 'stmmac-fixes'
Yuji Ishikawa says:
====================
net: stmmac: dwmac-visconti: Fix bit definitions and clock configuration for RMII mode
This series is a fix for RMII/MII operation mode of the dwmac-visconti driver.
It is composed of two parts:
* 1/2: fix constant definitions for cleared bits in ETHER_CLK_SEL register
* 2/2: fix configuration of ETHER_CLK_SEL register for running in RMII operation mode.
net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
v1 -> v2:
- added Fixes tag to commit message
net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
v1 -> v2:
- added Fixes tag to commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moshe Tal [Thu, 20 Jan 2022 09:55:50 +0000 (11:55 +0200)]
ethtool: Fix link extended state for big endian
The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.
Fixes: ecc31c60240b ("ethtool: Add link extended state") Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Tue, 18 Jan 2022 21:52:43 +0000 (15:52 -0600)]
net: phy: broadcom: hook up soft_reset for BCM54616S
A problem was encountered with the Bel-Fuse 1GBT-SFP05 SFP module (which
is a 1 Gbps copper module operating in SGMII mode with an internal
BCM54616S PHY device) using the Xilinx AXI Ethernet MAC core, where the
module would work properly on the initial insertion or boot of the
device, but after the device was rebooted, the link would either only
come up at 100 Mbps speeds or go up and down erratically.
I found no meaningful changes in the PHY configuration registers between
the working and non-working boots, but the status registers seemed to
have a lot of error indications set on the SERDES side of the device on
the non-working boot. I suspect the problem is that whatever happens on
the SGMII link when the device is rebooted and the FPGA logic gets
reloaded ends up putting the module's onboard PHY into a bad state.
Since commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
the genphy_soft_reset call is not made automatically by the PHY core
unless the callback is explicitly specified in the driver structure. For
most of these Broadcom devices, there is probably a hardware reset that
gets asserted to reset the PHY during boot, however for SFP modules
(where the BCM54616S is commonly found) no such reset line exists, so if
the board keeps the SFP cage powered up across a reboot, it will end up
with no reset occurring during reboots.
Hook up the genphy_soft_reset callback for BCM54616S to ensure that a
PHY reset is performed before the device is initialized. This appears to
fix the issue with erratic operation after a reboot with this SFP
module.
Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Nogueira [Tue, 18 Jan 2022 17:19:09 +0000 (14:19 -0300)]
net: sched: Clarify error message when qdisc kind is unknown
When adding a tc rule with a qdisc kind that is not supported or not
compiled into the kernel, the kernel emits the following error: "Error:
Specified qdisc not found.". Found via tdc testing when ETS qdisc was not
compiled in and it was not obvious right away what the message meant
without looking at the kernel code.
Change the error message to be more explicit and say the qdisc kind is
unknown.
Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Congyu Liu [Tue, 18 Jan 2022 19:20:13 +0000 (14:20 -0500)]
net: fix information leakage in /proc/net/ptype
In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.
Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.
Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.") Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Jan 2022 08:57:05 +0000 (10:57 +0200)]
Merge tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf.
Quite a handful of old regression fixes but most of those are
pre-5.16.
Current release - regressions:
- fix memory leaks in the skb free deferral scheme if upper layer
protocols are used, i.e. in-kernel TCP readers like TLS
Current release - new code bugs:
- nf_tables: fix NULL check typo in _clone() functions
- change the default to y for Vertexcom vendor Kconfig
- a couple of fixes to incorrect uses of ref tracking
- two fixes for constifying netdev->dev_addr
Previous releases - regressions:
- bpf:
- various verifier fixes mainly around register offset handling
when passed to helper functions
- fix mount source displayed for bpffs (none -> bpffs)
- bonding:
- fix extraction of ports for connection hash calculation
- fix bond_xmit_broadcast return value when some devices are down
- htb: restore minimal packet size handling in rate control
- sfp: fix high power modules without diagnostic monitoring
- mscc: ocelot:
- don't let phylink re-enable TX PAUSE on the NPI port
- don't dereference NULL pointers with shared tc filters
- smsc95xx: correct reset handling for LAN9514
- cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
- phy: micrel: use kszphy_suspend/_resume for irq aware devices,
avoid races with the interrupt
Previous releases - always broken:
- xdp: check prog type before updating BPF link
- smc: resolve various races around abnormal connection termination
- sit: allow encapsulated IPv6 traffic to be delivered locally
- axienet: fix init/reset handling, add missing barriers, read the
right status words, stop queues correctly
- add missing dev_put() in sock_timestamping_bind_phc()
Misc:
- ipv4: prevent accidentally passing RTO_ONLINK to
ip_route_output_key_hash() by sanitizing flags
- ipv4: avoid quadratic behavior in netns dismantle
- stmmac: dwmac-oxnas: add support for OX810SE
- fsl: xgmac_mdio: add workaround for erratum A-009885"
* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
ipv4: avoid quadratic behavior in netns dismantle
net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
dt-bindings: net: Document fsl,erratum-a009885
net/fsl: xgmac_mdio: Add workaround for erratum A-009885
net: mscc: ocelot: fix using match before it is set
net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
net: axienet: increase default TX ring size to 128
net: axienet: fix for TX busy handling
net: axienet: fix number of TX ring slots for available check
net: axienet: Fix TX ring slot available check
net: axienet: limit minimum TX ring size
net: axienet: add missing memory barriers
net: axienet: reset core on initialization prior to MDIO access
net: axienet: Wait for PhyRstCmplt after core reset
net: axienet: increase reset timeout
bpf, selftests: Add ringbuf memory type confusion test
...
Linus Torvalds [Thu, 20 Jan 2022 08:41:01 +0000 (10:41 +0200)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
Colin Ian King [Thu, 20 Jan 2022 02:10:38 +0000 (18:10 -0800)]
lib: remove redundant assignment to variable ret
The variable ret is being assigned a value that is never read. If the
for-loop is entered then ret is immediately re-assigned a new value. If
the for-loop is not executed ret is never read. The assignment is
redundant and can be removed.
Link: https://lkml.kernel.org/r/20211230134557.83633-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Thu, 20 Jan 2022 02:10:35 +0000 (18:10 -0800)]
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
The object-size sanitizer is redundant to -Warray-bounds, and
inappropriately performs its checks at run-time when all information
needed for the evaluation is available at compile-time, making it quite
difficult to use:
Marco Elver [Thu, 20 Jan 2022 02:10:31 +0000 (18:10 -0800)]
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
Until recent versions of GCC and Clang, it was not possible to disable
KCOV instrumentation via a function attribute. The relevant function
attribute was introduced in 540540d06e9d9 ("kcov: add
__no_sanitize_coverage to fix noinstr for all architectures").
x86 was the first architecture to want a working noinstr, and at the
time no compiler support for the attribute existed yet. Therefore,
commit 0f1441b44e823 ("objtool: Fix noinstr vs KCOV") introduced the
ability to NOP __sanitizer_cov_*() calls in .noinstr.text.
However, this doesn't work for other architectures like arm64 and s390
that want a working noinstr per ARCH_WANTS_NO_INSTR.
At the time of 0f1441b44e823, we didn't yet have ARCH_WANTS_NO_INSTR,
but now we can move the Kconfig dependency checks to the generic KCOV
option. KCOV will be available if:
- architecture does not care about noinstr, OR
- we have objtool support (like on x86), OR
- GCC is 12.0 or newer, OR
- Clang is 13.0 or newer.
Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
Commit b05fbcc36be1 ("btrfs: disable build on platforms having page size
256K") disabled btrfs for configurations that used a 256kB page size.
However, it did not fully solve the problem because CONFIG_TEST_KMOD
selects CONFIG_BTRFS, which does not account for the dependency. This
results in a Kconfig warning and the failed BUILD_BUG_ON error
returning.
WARNING: unmet direct dependencies detected for BTRFS_FS
Depends on [n]: BLOCK [=y] && !PPC_256K_PAGES && !PAGE_SIZE_256KB [=y]
Selected by [m]:
- TEST_KMOD [=m] && RUNTIME_TESTING_MENU [=y] && m && MODULES [=y] && NETDEVICES [=y] && NET_CORE [=y] && INET [=y] && BLOCK [=y]
To resolve this, add CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency of
CONFIG_TEST_KMOD so there is no more invalid configuration or build
errors.
Link: https://lkml.kernel.org/r/20211129230141.228085-4-nathan@kernel.org Fixes: b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
btrfs: use generic Kconfig option for 256kB page size limit
Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe
the dependency introduced by commit b05fbcc36be1 ("btrfs: disable build
on platforms having page size 256K").
Link: https://lkml.kernel.org/r/20211129230141.228085-3-nathan@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: kernel test robot <lkp@intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
Patch series "Fix CONFIG_TEST_KMOD with 256kB page size".
The kernel test robot reported a build error [1] from a failed assertion
in fs/btrfs/inode.c with a hexagon randconfig that includes
CONFIG_PAGE_SIZE_256KB. This error is the same one that was addressed
by commit b05fbcc36be1 ("btrfs: disable build on platforms having page
size 256K") but CONFIG_TEST_KMOD selects CONFIG_BTRFS without having the
"page size less than 256kB dependency", which results in the error
reappearing.
The first patch introduces CONFIG_PAGE_SIZE_LESS_THAN_256KB by splitting
it off from CONFIG_PAGE_SIZE_LESS_THAN_64KB, which was introduced in
commit 1f0e290cc5fd ("arch: Add generic Kconfig option indicating page
size smaller than 64k") for a similar reason in 5.16-rc3.
The second patch uses that configuration option for CONFIG_BTRFS to
reduce duplication.
The third patch resolves the build error by adding
CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency to CONFIG_TEST_KMOD so
that CONFIG_BTRFS does not get enabled under that invalid configuration.
btrfs requires a page size smaller than 256kB. To use that dependency
in other places, introduce CONFIG_PAGE_SIZE_LESS_THAN_256KB and reuse
that dependency in CONFIG_PAGE_SIZE_LESS_THAN_64KB.
Qian Cai [Thu, 20 Jan 2022 02:10:18 +0000 (18:10 -0800)]
configs: introduce debug.config for CI-like setup
Some general debugging features like kmemleak, KASAN, lockdep, UBSAN etc
help fix many viruses like a microscope. On the other hand, those
features are scatter around and mixed up with more situational debugging
options making them difficult to consume properly. This cold help
amplify the general debugging/testing efforts and help establish
sensitive default values for those options across the broad. This could
also help different distros to collaborate on maintaining debug-flavored
kernels.
The config is based on years' experiences running daily CI inside the
largest enterprise Linux distro company to seek regressions on
linux-next builds on different bare-metal and virtual platforms. It can
be used for example,
$ make ARCH=arm64 defconfig debug.config
Since KASAN and KCSAN can't be enabled together, we will need to create
a separate one for KCSAN later as well.
Link: https://lkml.kernel.org/r/20211115134754.7334-1-quic_qiancai@quicinc.com Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: "Stephen Rothwell" <sfr@canb.auug.org.au> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wangyong [Thu, 20 Jan 2022 02:10:15 +0000 (18:10 -0800)]
delayacct: track delays from memory compact
Delay accounting does not track the delay of memory compact. When there
is not enough free memory, tasks can spend a amount of their time
waiting for compact.
To get the impact of tasks in direct memory compact, measure the delay
when allocating memory through memory compact.
CPU count real total virtual total delay total delay average
277 78000000084903948518877296 0.068ms
IO count delay total delay average
0 0 0ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
5 11088812685 2217ms
THRASHING count delay total delay average
0 0 0ms
COMPACT count delay total delay average
3 72758 0ms
watch: read=0, write=0, cancelled_write=0
Link: https://lkml.kernel.org/r/1638619795-71451-1-git-send-email-wang.yong12@zte.com.cn Signed-off-by: wangyong <wang.yong12@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Zhang Wenya <zhang.wenya1@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wangyong [Thu, 20 Jan 2022 02:10:12 +0000 (18:10 -0800)]
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
Add thrashing page cache and direct compact related descriptions and
update the usage of getdelays userspace utility.
The following patches modifications have been updated:
https://lore.kernel.org/all/20190312102002.31737-4-jinpuwang@gmail.com/
https://lore.kernel.org/all/1638619795-71451-1-git-send-email-
wang.yong12@zte.com.cn/
Link: https://lkml.kernel.org/r/1639583021-92977-1-git-send-email-wang.yong12@zte.com.cn Signed-off-by: wangyong <wang.yong12@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Yang [Thu, 20 Jan 2022 02:10:09 +0000 (18:10 -0800)]
delayacct: cleanup flags in struct task_delay_info and functions use it
Flags in struct task_delay_info is used to distinguish the difference
between swapin and blkio delay acountings. But after patch "delayacct:
support swapin delay accounting for swapping without blkio", there is no
need to do that since swapin and blkio delay accounting use their own
functions.
Link: https://lkml.kernel.org/r/20211124065958.36703-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Yang [Thu, 20 Jan 2022 02:10:06 +0000 (18:10 -0800)]
delayacct: fix incomplete disable operation when switch enable to disable
When a task is created after delayacct is enabled, kernel will do all
the delay accountings for that task. The problems is if user disables
delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio
delay accounting is disabled.
Now disable all the kinds of delay accountings when
/proc/sys/kernel/task_delayacct sets to zero.
Link: https://lkml.kernel.org/r/20211123140342.32962-1-ran.xiaokai@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Yang [Thu, 20 Jan 2022 02:10:02 +0000 (18:10 -0800)]
delayacct: support swapin delay accounting for swapping without blkio
Currently delayacct accounts swapin delay only for swapping that cause
blkio. If we use zram for swapping, tools/accounting/getdelays can't
get any SWAP delay.
It's useful to get zram swapin delay information, for example to adjust
compress algorithm or /proc/sys/vm/swappiness.
Reference to PSI, it accounts any kind of swapping by doing its work in
swap_readpage(), no matter whether swapping causes blkio. Let delayacct
do the similar work.
Link: https://lkml.kernel.org/r/20211112083813.8559-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oops id has been added as part of the end of trace marker for the
kerneloops.org project. The id is used to automatically identify
duplicate submissions of the same report. Identical looking reports
with different a id can be considered as the same oops occurred again.
The early initialisation of the oops_id can create a warning if the
random core is not yet fully initialized. On PREEMPT_RT it is
problematic if the id is initialized on demand from non preemptible
context.
The kernel oops project is not available since 2017. Remove the oops_id
and use 0 in the output in case parser rely on it.
Link: https://bugs.debian.org/953172 Link: https://lkml.kernel.org/r/Ybdi16aP2NEugWHq@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Thu, 20 Jan 2022 02:09:50 +0000 (18:09 -0800)]
FAT: use io_schedule_timeout() instead of congestion_wait()
congestion_wait() in this context is just a sleep - block devices do not
support congestion signalling any more.
The goal for this wait, which was introduced in commit ae78bf9c4f5f
("[PATCH] add -o flush for fat") is to wait for any recently written
data to get to storage. We currently have no direct mechanism to do
this, so a simple wait that behaves identically to the current
congestion_wait() is the best we can do.
Kees Cook [Thu, 20 Jan 2022 02:09:47 +0000 (18:09 -0800)]
hfsplus: use struct_group_attr() for memcpy() region
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Add struct_group() to mark the "info" region (containing struct DInfo
and struct DXInfo structs) in struct hfsplus_cat_folder and struct
hfsplus_cat_file that are written into directly, so the compiler can
correctly reason about the expected size of the writes.
"pahole" shows no size nor member offset changes to struct
hfsplus_cat_folder nor struct hfsplus_cat_file. "objdump -d" shows no
object code changes.
Link: https://lkml.kernel.org/r/20211119192851.1046717-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Thu, 20 Jan 2022 02:09:44 +0000 (18:09 -0800)]
nilfs2: remove redundant pointer sbufs
Pointer sbufs is being assigned a value but it's not being used later
on. The pointer is redundant and can be removed. Cleans up scan-build
static analysis warning:
fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs'
is used in the enclosing expression, the value is never actually read
from 'sbufs' [deadcode.DeadStores]
sbh = sbufs = page_buffers(src);
H.J. Lu [Thu, 20 Jan 2022 02:09:40 +0000 (18:09 -0800)]
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries. This
fixes: