David S. Miller [Wed, 26 Aug 2020 23:00:51 +0000 (16:00 -0700)]
Merge branch 'ipv4-nexthop-Various-improvements'
Ido Schimmel says:
====================
ipv4: nexthop: Various improvements
This patch set contains various improvements that I made to the nexthop
object code while studying it towards my upcoming changes.
While patches #4 and #6 fix bugs, they are not regressions (never
worked). They also do not occur to me as critical issues, which is why I
am targeting them at net-next.
Ido Schimmel [Wed, 26 Aug 2020 16:48:57 +0000 (19:48 +0300)]
selftests: fib_nexthops: Test IPv6 route with group after replacing IPv4 nexthops
Test that an IPv6 route can not use a nexthop group with mixed IPv4 and
IPv6 nexthops, but can use it after replacing the IPv4 nexthops with
IPv6 nexthops.
Output without previous patch:
# ./fib_nexthops.sh -t ipv6_fcnal_runtime
IPv6 functional runtime
-----------------------
TEST: Route add [ OK ]
TEST: Route delete [ OK ]
TEST: Ping with nexthop [ OK ]
TEST: Ping - multipath [ OK ]
TEST: Ping - blackhole [ OK ]
TEST: Ping - blackhole replaced with gateway [ OK ]
TEST: Ping - gateway replaced by blackhole [ OK ]
TEST: Ping - group with blackhole [ OK ]
TEST: Ping - group blackhole replaced with gateways [ OK ]
TEST: IPv6 route with device only nexthop [ OK ]
TEST: IPv6 multipath route with nexthop mix - dev only + gw [ OK ]
TEST: IPv6 route can not have a v4 gateway [ OK ]
TEST: Nexthop replace - v6 route, v4 nexthop [ OK ]
TEST: Nexthop replace of group entry - v6 route, v4 nexthop [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after removing v4 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after replacing v4 gateways [FAIL]
TEST: Nexthop with default route and rpfilter [ OK ]
TEST: Nexthop with multipath default route and rpfilter [ OK ]
Tests passed: 21
Tests failed: 1
Output with previous patch:
# ./fib_nexthops.sh -t ipv6_fcnal_runtime
IPv6 functional runtime
-----------------------
TEST: Route add [ OK ]
TEST: Route delete [ OK ]
TEST: Ping with nexthop [ OK ]
TEST: Ping - multipath [ OK ]
TEST: Ping - blackhole [ OK ]
TEST: Ping - blackhole replaced with gateway [ OK ]
TEST: Ping - gateway replaced by blackhole [ OK ]
TEST: Ping - group with blackhole [ OK ]
TEST: Ping - group blackhole replaced with gateways [ OK ]
TEST: IPv6 route with device only nexthop [ OK ]
TEST: IPv6 multipath route with nexthop mix - dev only + gw [ OK ]
TEST: IPv6 route can not have a v4 gateway [ OK ]
TEST: Nexthop replace - v6 route, v4 nexthop [ OK ]
TEST: Nexthop replace of group entry - v6 route, v4 nexthop [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after removing v4 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after replacing v4 gateways [ OK ]
TEST: Nexthop with default route and rpfilter [ OK ]
TEST: Nexthop with multipath default route and rpfilter [ OK ]
Tests passed: 22
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 26 Aug 2020 16:48:56 +0000 (19:48 +0300)]
ipv4: nexthop: Correctly update nexthop group when replacing a nexthop
Each nexthop group contains an indication if it has IPv4 nexthops
('has_v4'). Its purpose is to prevent IPv6 routes from using groups with
IPv4 nexthops.
However, the indication is not updated when a nexthop is replaced. This
results in the kernel wrongly rejecting IPv6 routes from pointing to
groups that only contain IPv6 nexthops. Example:
# ip nexthop replace id 1 via 192.0.2.2 dev dummy10
# ip nexthop replace id 10 group 1
# ip nexthop replace id 1 via 2001:db8:1::2 dev dummy10
# ip route replace 2001:db8:10::/64 nhid 10
Error: IPv6 routes can not use an IPv4 nexthop.
Solve this by iterating over all the nexthop groups that the replaced
nexthop is a member of and potentially update their IPv4 indication
according to the new set of member nexthops.
Avoid wasting cycles by only performing the update in case an IPv4
nexthop is replaced by an IPv6 nexthop.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 26 Aug 2020 16:48:55 +0000 (19:48 +0300)]
selftests: fib_nexthops: Test IPv6 route with group after removing IPv4 nexthops
Test that an IPv6 route can not use a nexthop group with mixed IPv4 and
IPv6 nexthops, but can use it after deleting the IPv4 nexthops.
Output without previous patch:
# ./fib_nexthops.sh -t ipv6_fcnal_runtime
IPv6 functional runtime
-----------------------
TEST: Route add [ OK ]
TEST: Route delete [ OK ]
TEST: Ping with nexthop [ OK ]
TEST: Ping - multipath [ OK ]
TEST: Ping - blackhole [ OK ]
TEST: Ping - blackhole replaced with gateway [ OK ]
TEST: Ping - gateway replaced by blackhole [ OK ]
TEST: Ping - group with blackhole [ OK ]
TEST: Ping - group blackhole replaced with gateways [ OK ]
TEST: IPv6 route with device only nexthop [ OK ]
TEST: IPv6 multipath route with nexthop mix - dev only + gw [ OK ]
TEST: IPv6 route can not have a v4 gateway [ OK ]
TEST: Nexthop replace - v6 route, v4 nexthop [ OK ]
TEST: Nexthop replace of group entry - v6 route, v4 nexthop [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after deleting v4 gateways [FAIL]
TEST: Nexthop with default route and rpfilter [ OK ]
TEST: Nexthop with multipath default route and rpfilter [ OK ]
Tests passed: 18
Tests failed: 1
Output with previous patch:
bash-5.0# ./fib_nexthops.sh -t ipv6_fcnal_runtime
IPv6 functional runtime
-----------------------
TEST: Route add [ OK ]
TEST: Route delete [ OK ]
TEST: Ping with nexthop [ OK ]
TEST: Ping - multipath [ OK ]
TEST: Ping - blackhole [ OK ]
TEST: Ping - blackhole replaced with gateway [ OK ]
TEST: Ping - gateway replaced by blackhole [ OK ]
TEST: Ping - group with blackhole [ OK ]
TEST: Ping - group blackhole replaced with gateways [ OK ]
TEST: IPv6 route with device only nexthop [ OK ]
TEST: IPv6 multipath route with nexthop mix - dev only + gw [ OK ]
TEST: IPv6 route can not have a v4 gateway [ OK ]
TEST: Nexthop replace - v6 route, v4 nexthop [ OK ]
TEST: Nexthop replace of group entry - v6 route, v4 nexthop [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route can not have a group with v4 and v6 gateways [ OK ]
TEST: IPv6 route using a group after deleting v4 gateways [ OK ]
TEST: Nexthop with default route and rpfilter [ OK ]
TEST: Nexthop with multipath default route and rpfilter [ OK ]
Tests passed: 19
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 26 Aug 2020 16:48:54 +0000 (19:48 +0300)]
ipv4: nexthop: Correctly update nexthop group when removing a nexthop
Each nexthop group contains an indication if it has IPv4 nexthops
('has_v4'). Its purpose is to prevent IPv6 routes from using groups with
IPv4 nexthops.
However, the indication is not updated when a nexthop is removed. This
results in the kernel wrongly rejecting IPv6 routes from pointing to
groups that only contain IPv6 nexthops. Example:
# ip nexthop replace id 1 via 192.0.2.2 dev dummy10
# ip nexthop replace id 2 via 2001:db8:1::2 dev dummy10
# ip nexthop replace id 10 group 1/2
# ip nexthop del id 1
# ip route replace 2001:db8:10::/64 nhid 10
Error: IPv6 routes can not use an IPv4 nexthop.
Solve this by updating the indication according to the new set of
member nexthops.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 26 Aug 2020 16:48:52 +0000 (19:48 +0300)]
ipv4: nexthop: Use nla_put_be32() for NHA_GATEWAY
The code correctly uses nla_get_be32() to get the payload of the
attribute, but incorrectly uses nla_put_u32() to add the attribute to
the payload. This results in the following warning:
net/ipv4/nexthop.c:279:59: warning: incorrect type in argument 3 (different base types)
net/ipv4/nexthop.c:279:59: expected unsigned int [usertype] value
net/ipv4/nexthop.c:279:59: got restricted __be32 [usertype] ipv4
Suppress the warning by using nla_put_be32().
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Aug 2020 22:55:54 +0000 (15:55 -0700)]
Merge branch 'net_prefetch-API'
Tariq Toukan says:
====================
net_prefetch API
This patchset adds a common net API for L1 cacheline size-aware prefetch.
Patch 1 introduces the common API in net and aligns the drivers to use it.
Patches 2 and 3 add usage in mlx4 and mlx5 Eth drivers.
Series generated against net-next commit: 7a1c0fdb6dbb Merge tag 'batadv-next-for-davem-20200824' of git://git.open-mesh.org/linux-merge
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 26 Aug 2020 12:54:18 +0000 (15:54 +0300)]
net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES
A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 26 Aug 2020 12:54:17 +0000 (15:54 +0300)]
net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES
A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 26 Aug 2020 12:54:16 +0000 (15:54 +0300)]
net: Take common prefetch code structure into a function
Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Add Ethernet support for Intel Keem Bay SoC
This patch set enables support for Ethernet on the Intel Keem Bay SoC.
The first patch contains the required Device Tree bindings documentation,
while the second patch adds the Intel platform glue layer for the stmmac
device driver.
This driver was tested on the Keem Bay evaluation module board.
Changes since v2:
-Add a select in DT documentation to avoid matching with all nodes containing 'snps,dwmac'
-Rebased to 5.9-rc1
Changes since v1:
-Removed clocks maxItems property from DT bindings documentation
-Removed phy compatible strings from DT bindings documentation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dwmac-intel-plat to enable the stmmac driver in Intel Keem Bay.
Also add fix_mac_speed and tx_clk in order to change link speeds.
This is required as mac_speed_o is not connected in the
Intel Keem Bay SoC.
Signed-off-by: Rusaimi Amira Ruslan <rusaimi.amira.rusaimi@intel.com> Signed-off-by: Vineetha G. Jaya Kumaran <vineetha.g.jaya.kumaran@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Miaohe Lin [Tue, 25 Aug 2020 11:33:22 +0000 (07:33 -0400)]
net: Set ping saddr after we successfully get the ping port
We can defer set ping saddr until we successfully get the ping port. So we
can avoid clear saddr when failed. Since ping_clear_saddr() is not used
anymore now, remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Rangoju [Tue, 25 Aug 2020 03:55:46 +0000 (09:25 +0530)]
cxgb4: add error handlers to LE intr_handler
cxgb4 does not look for HASHTBLMEMCRCERR and CMDTIDERR
bits in LE_DB_INT_CAUSE register, but these are enabled
in LE_DB_INT_ENABLE. So, add error handlers to LE
interrupt handler to emit a warning or alert message
for hash table mem crc and cmd tid errors
Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Aug 2020 01:15:45 +0000 (18:15 -0700)]
Merge branch 'Add-PTP-support-for-Octeontx2'
Subbaraya Sundeep says:
====================
Add PTP support for Octeontx2
This patchset adds PTP support for Octeontx2 platform.
PTP is an independent coprocessor block from which
CGX block fetches timestamp and prepends it to the
packet before sending to NIX block. Patches are as
follows:
Patch 1: Patch to enable/disable packet timstamping
in CGX upon mailbox request. It also adjusts
packet parser (NPC) for the 8 bytes timestamp
appearing before the packet.
Patch 2: Patch adding PTP pci driver which configures
the PTP block and hooks up to RVU AF driver.
It also exposes a mailbox call to adjust PTP
hardware clock.
Patch 3: Patch adding PTP clock driver for PF netdev.
v8:
Added missing header file reported by kernel test robot
in patch 2
v7:
As per Jesse Brandeburg comments:
Simplified functions in patch 1
Replaced magic numbers with macros
Added Copyrights
Added code comments wherever required
Modified commit description of patch 2
v6:
Resent after net-next is open
v5:
As suggested by David separated the fix (adding rtnl lock/unlock)
and submitted to net.
https://www.spinics.net/lists/netdev/msg669617.html
v4:
Added rtnl_lock/unlock in otx2_reset to protect against
network stack ndo_open and close calls
Added NULL check after ptp_clock_register in otx2_ptp.c
v3:
Fixed sparse error in otx2_txrx.c
Removed static inlines in otx2_txrx.c
v2:
Fixed kernel build robot reported error by
adding timecounter.h to otx2_common.h
====================
Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksey Makarov [Mon, 24 Aug 2020 15:50:01 +0000 (21:20 +0530)]
octeontx2-af: Add support for Marvell PTP coprocessor
Precision Timestamping block found on Octeontx2
platform is an independent coprocessor and has
internal PTP hardware clock. Once configured PTP
runs independently and when a packet arrives
CGX hardware block gets the current timestamp
from PTP block and forwards the packet to NIX
by prepending timestamp to the packet.
This patch adds the pci driver for PTP block.
The driver gets registered by AF driver and does
initial configuration and exposes a mailbox function to
read and adjust PTP hardware clock. The mailbox function
is called by AF consumers like netdev drivers or
userspace drivers. Since PTP being a single block
in platform this driver helps in accessing PTP
block by any AF consumer.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Aleksey Makarov <amakarov@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Zyta Szpak [Mon, 24 Aug 2020 15:50:00 +0000 (21:20 +0530)]
octeontx2-af: Support to enable/disable HW timestamping
Four new mbox messages ids and handler are added in order to
enable or disable timestamping procedure on tx and rx side.
Additionally when PTP is enabled, the packet parser must skip
over 8 bytes and start analyzing packet data there. To make NPC
profiles work seemlesly PTR_ADVANCE of IKPU is set so that
parsing can be done as before when all data pointers
are shifted by 8 bytes automatically.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Zyta Szpak <zyta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This is an implementation of devlink health infrastructure.
With this we are now able to report HW errors to devlink, and it'll take
its own actions depending on user configuration to capture and store the
dump at the bad moment, and to request the driver to recover the device.
So far we do not differentiate global device failures or specific PCI
function failures. This means that some errors specific to one physical
function will affect an entire device. This is not yet fully designed
and verified, will followup in future.
Solution was verified with artificial HW errors generated, existing
tools for dump analysis could be used.
v7: comments from Jesse and Jakub
- p2: extra edev check
- p9: removed extra indents
v6: patch 4: changing serial to board.serial and fw to fw.app
v5: improved patch 4 description
v4:
- commit message and other fixes after Jiri's comments
- removed one patch (will send to net)
v3: fix uninit var usage in patch 11
v2: fix #include issue from kbuild test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:34 +0000 (14:19 +0300)]
qede: make driver reliable on unload after failures
In case recovery was not successful, netdev still should be
present. But we should clear cdev if something bad happens
on recovery.
We also check cdev for null on dev close. That could be a case
if recovery was not successful.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:33 +0000 (14:19 +0300)]
qed: align adjacent indent
Remove extra indent on some of adjacent declarations.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:32 +0000 (14:19 +0300)]
qed: implement devlink dump
Gather and push out full device dump to devlink.
Device dump is the same as with `ethtool -d`, but now its generated
exactly at the moment bad thing happens.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:31 +0000 (14:19 +0300)]
qed*: make use of devlink recovery infrastructure
Remove forcible recovery trigger and put it as a normal devlink
callback.
This allows user to enable/disable it via
devlink health set pci/0000:03:00.0 reporter fw_fatal auto_recover false
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:30 +0000 (14:19 +0300)]
qed: use devlink logic to report errors
Use devlink_health_report to push error indications.
We implement this in qede via callback function to make it possible
to reuse the same for other drivers sitting on top of qed in future.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:29 +0000 (14:19 +0300)]
qed: health reporter init deinit seq
Here we declare health reporter ops (empty for now)
and register these in qed probe and remove callbacks.
This way we get devlink attached to all kind of qed* PCI
device entities: networking or storage offload entity.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:28 +0000 (14:19 +0300)]
qed: implement devlink info request
Here we return existing fw & mfw versions, we also fetch device's
serial number:
~$ sudo ~/iproute2/devlink/devlink dev info
pci/0000:01:00.1:
driver qed
board.serial_number REE1915E44552
versions:
running:
fw.app 8.42.2.0
stored:
fw.mgmt 8.52.10.0
MFW and FW are different firmwares on device.
Management is a firmware responsible for link configuration and
various control plane features. Its permanent and resides in NVM.
Running FW (or fastpath FW) is an embedded microprogram implementing
all the packet processing, offloads, etc. This FW is being loaded
on each start by the driver from FW binary blob.
The base device specific structure (qed_dev_info) was not directly
available to the base driver before. Thus, here we create and store
a private copy of this structure in qed_dev root object to
access the data.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:27 +0000 (14:19 +0300)]
qed: fix kconfig help entries
This patch replaces stubs in kconfig help entries with an actual description.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:26 +0000 (14:19 +0300)]
qed/qede: make devlink survive recovery
Devlink instance lifecycle was linked to qed_dev object,
that caused devlink to be recreated on each recovery.
Changing it by making higher level driver (qede) responsible for its
life. This way devlink now survives recoveries.
qede now stores devlink structure pointer as a part of its device
object, devlink private data contains a linkage structure,
qed_devlink.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Sun, 23 Aug 2020 11:19:25 +0000 (14:19 +0300)]
qed: move out devlink logic into a new file
We are extending devlink infrastructure, thus move the existing
stuff into a new file qed_devlink.c
Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'free_rx_resources()' and
'alloc_tx_resources()' (sge.c) GFP_KERNEL can be used because it is
already used in these functions.
Moreover, they can only be called from a .ndo_open function. So it is
guarded by the 'rtnl_lock()', which is a mutex.
While at it, a pr_err message in 'init_one()' has been updated accordingly
(s/consistent/coherent).
David S. Miller [Tue, 25 Aug 2020 00:36:11 +0000 (17:36 -0700)]
Merge branch 'mlxsw-Misc-updates'
Ido Schimmel says:
====================
mlxsw: Misc updates
This patch set includes various updates for mlxsw.
Patches #1-#4 adjust the default burst size of packet trap policers to
conform to Spectrum-{2,3} requirements. The corresponding selftest is
also adjusted so that it could reliably pass on these platforms.
Patch #5 adjusts a selftest so that it could pass with both old and new
versions of mausezahn.
Patch #6 significantly reduces the runtime of tc-police scale test by
changing the preference and masks of the used tc filters.
Patch #7 prevents the driver from trying to set invalid ethtool link
modes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Sun, 23 Aug 2020 08:06:28 +0000 (11:06 +0300)]
mlxsw: spectrum_ethtool: Remove internal speeds from PTYS register
The PTYS register is used to report and configure the port type and
speed. Currently, internal bits in the register are used the same way
other bits are used.
Using the internal bits can cause bad parameter firmware errors. For
example, trying to write to internal bit 25 returns:
Ido Schimmel [Sun, 23 Aug 2020 08:06:27 +0000 (11:06 +0300)]
selftests: mlxsw: Reduce runtime of tc-police scale test
Currently, the test takes about 626 seconds to complete because of an
inefficient use of the device's TCAM. Reduce the runtime to 202 seconds
by inserting all the flower filters with the same preference and mask,
but with a different key.
In particular, this reduces the deletion of the qdisc (which triggers
the deletion of all the filters) from 66 seconds to 0.2 seconds. This
prevents various netlink requests from user space applications (e.g.,
systemd-networkd) from timing-out because RTNL is not held for too long
anymore.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Sun, 23 Aug 2020 08:06:26 +0000 (11:06 +0300)]
selftests: forwarding: Fix mausezahn delay parameter in mirror_test()
Currently, mausezahn delay parameter in mirror_test() is specified with
'ms' units.
mausezahn versions before 0.6.5 interpret 'ms' as seconds and therefore
the tests that use mirror_test() take a very long time to complete.
Resolve this by specifying 'msec' units.
Signed-off-by: Danielle Ratson <danieller@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 23 Aug 2020 08:06:25 +0000 (11:06 +0300)]
selftests: mlxsw: Increase burst size for burst test
The current combination of rate and burst size does not adhere to
Spectrum-{2,3} limitation which states that the minimum burst size
should be 40% of the rate.
Increase the burst size in order to honor above mentioned limitation and
avoid intermittent failures of this test case on Spectrum-{2,3}.
Remove the first sub-test case as the variation in number of received
packets is simply too large to reliably test it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 23 Aug 2020 08:06:24 +0000 (11:06 +0300)]
selftests: mlxsw: Increase burst size for rate test
The current combination of rate and burst size does not adhere to
Spectrum-{2,3} limitation which states that the minimum burst size
should be 40% of the rate.
Increase the burst size in order to honor above mentioned limitation and
avoid intermittent failures of this test case on Spectrum-{2,3}.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sun, 23 Aug 2020 08:06:22 +0000 (11:06 +0300)]
mlxsw: spectrum_trap: Adjust default policer burst size for Spectrum-{2, 3}
On the Spectrum-{2,3} ASICs the minimum burst size of the packet trap
policers needs to be 40% of the configured rate. Otherwise, intermittent
drops are observed even when the incoming packet rate is slightly lower
than the configured policer rate.
Adjust the burst size of the registered packet trap policers so that
they do not violate above mentioned limitation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'atl1e_setup_ring_resources()' (atl1e_main.c),
'atl1_setup_ring_resources()' (atl1.c) and 'atl2_setup_ring_resources()'
(atl2.c) GFP_KERNEL can be used because it can be called from a .ndo_open.
'atl1_setup_ring_resources()' (atl1.c) can also be called from a
'.set_ringparam' (see struct ethtool_ops) where sleep is also allowed.
Both cases are protected by 'rtnl_lock()' which is a mutex. So these
function can sleep.
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'netdev_open()', GFP_ATOMIC must be used
because it can be called from a .ndo_tx_timeout function.
So this function can be called with the 'netif_tx_lock' acquired.
The call chain is:
--> tx_timeout (.ndo_tx_timeout function)
--> netdev_open
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'typhoon_init_one()' GFP_KERNEL can be used
because it is a probe function and no lock is acquired.
When memory is allocated in 'typhoon_download_firmware()', GFP_ATOMIC
must be used because it can be called from a .ndo_tx_timeout function.
So this function can be called with the 'netif_tx_lock' acquired.
The call chain is:
--> typhoon_tx_timeout (.ndo_tx_timeout function)
--> typhoon_start_runtime
--> typhoon_download_firmware
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sun, 23 Aug 2020 01:07:13 +0000 (18:07 -0700)]
net: dccp: delete repeated words
Drop duplicated words in /net/dccp/.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:40:15 +0000 (16:40 -0700)]
net: netlink: delete repeated words
Drop duplicated words in net/netlink/.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:31:41 +0000 (16:31 -0700)]
net: ipv4: delete repeated words
Drop duplicate words in comments in net/ipv4/.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:16:01 +0000 (16:16 -0700)]
net: sctp: ulpqueue.c: delete duplicated word
Drop the repeated word "an".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:16:00 +0000 (16:16 -0700)]
net: sctp: sm_make_chunk.c: delete duplicated words + fix typo
Drop the repeated words "for", "that", and "a".
Change "his" to "this".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:15:59 +0000 (16:15 -0700)]
net: sctp: protocol.c: delete duplicated words + punctuation
Drop the repeated words "of" and "that".
Add some punctuation for readability.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:15:58 +0000 (16:15 -0700)]
net: sctp: chunk.c: delete duplicated word
Drop the repeated word "the".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:15:57 +0000 (16:15 -0700)]
net: sctp: bind_addr.c: delete duplicated word
Drop the repeated word "of".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:15:56 +0000 (16:15 -0700)]
net: sctp: auth.c: delete duplicated words
Drop the repeated word "the" and "now".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Sat, 22 Aug 2020 23:15:55 +0000 (16:15 -0700)]
net: sctp: associola.c: delete duplicated words
Drop the repeated word "the" in two places.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Luke Hsiao [Sat, 22 Aug 2020 04:41:05 +0000 (21:41 -0700)]
io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.
This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.
Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Luke Hsiao [Sat, 22 Aug 2020 04:41:04 +0000 (21:41 -0700)]
io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
For TCP tx zero-copy, the kernel notifies the process of completions by
queuing completion notifications on the socket error queue. This patch
allows reading these notifications via recvmsg to support TCP tx
zero-copy.
Ancillary data was originally disallowed due to privilege escalation
via io_uring's offloading of sendmsg() onto a kernel thread with kernel
credentials (https://crbug.com/project-zero/1975). So, we must ensure
that the socket type is one where the ancillary data types that are
delivered on recvmsg are plain data (no file descriptors or values that
are translated based on the identity of the calling process).
This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
with tx zero-copy enabled. Before this patch, we received -EINVALID from
this specific code path. After this patch, we could read tcp tx
zero-copy completion notifications from the MSG_ERRQUEUE.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Arjun Roy <arjunroy@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
devlink fixes for port and reporter field access
These series contains two small fixes of devlink.
Patch-1 initializes port reporter fields early enough to
avoid access before initialized error.
Patch-2 protects port list lock during traversal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parav Pandit [Fri, 21 Aug 2020 19:12:21 +0000 (22:12 +0300)]
devlink: Protect devlink port list traversal
Cited patch in fixes tag misses to protect port list traversal
while traversing per port reporter list.
Protect it using devlink instance lock.
Fixes: 5c243a048d48 ("devlink: Implement devlink health reporters on per-port basis") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Parav Pandit [Fri, 21 Aug 2020 19:12:20 +0000 (22:12 +0300)]
devlink: Fix per port reporter fields initialization
Cited patch in fixes tag initializes reporters_list and reporters_lock
of a devlink port after devlink port is added to the list. Once port
is added to the list, devlink_nl_cmd_health_reporter_get_dumpit()
can access the uninitialized mutex and reporters list head.
Fix it by initializing port reporters field before adding port to the
list.
Fixes: 5c243a048d48 ("devlink: Implement devlink health reporters on per-port basis") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 21 Aug 2020 18:39:01 +0000 (13:39 -0500)]
ibmvnic: Fix use-after-free of VNIC login response buffer
The login response buffer is freed after it is received
and parsed, but other functions in the driver still attempt
to read it, such as when the device is opened, causing the
Oops below. Store relevant information in the driver's
private data structures and use those instead.
Fixes: defcbbb708d4 ("ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct") Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Fri, 21 Aug 2020 17:47:32 +0000 (17:47 +0000)]
ipvlan: advertise link netns via netlink
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Test commands:
ip netns add nst
ip link add dummy0 type dummy
ip link add ipvlan0 link dummy0 type ipvlan
ip link set ipvlan0 netns nst
ip netns exec nst ip link show ipvlan0
Linus Torvalds [Sun, 23 Aug 2020 18:37:23 +0000 (11:37 -0700)]
Merge tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Add perf support for emitting extended registers for power10.
- A fix for CPU hotplug on pseries, where on large/loaded systems we
may not wait long enough for the CPU to be offlined, leading to
crashes.
- Addition of a raw cputable entry for Power10, which is not required
to boot, but is required to make our PMU setup work correctly in
guests.
- Three fixes for the recent changes on 32-bit Book3S to move modules
into their own segment for strict RWX.
- A fix for a recent change in our powernv PCI code that could lead to
crashes.
- A change to our perf interrupt accounting to avoid soft lockups when
using some events, found by syzkaller.
- A change in the way we handle power loss events from the hypervisor
on pseries. We no longer immediately shut down if we're told we're
running on a UPS.
- A few other minor fixes.
Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
Vaidyanathan Srinivasan, Vasant Hegde.
* tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
powerpc/pseries: Do not initiate shutdown when system is running on UPS
powerpc/perf: Fix soft lockups due to missed interrupt accounting
powerpc/powernv/pci: Fix possible crash when releasing DMA resources
powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
powerpc/fixmap: Fix the size of the early debug area
powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
powerpc/kernel: Cleanup machine check function declarations
powerpc: Add POWER10 raw mode cputable entry
powerpc/perf: Add extended regs support for power10 platform
powerpc/perf: Add support for outputting extended regs in perf intr_regs
powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
Linus Torvalds [Sun, 23 Aug 2020 18:21:16 +0000 (11:21 -0700)]
Merge tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for x86 which removes the RDPID usage from the paranoid
entry path and unconditionally uses LSL to retrieve the CPU number.
RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid
expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values
and restores them either when leaving the run loop, on preemption or
when going out to user space. MSR_TSX_AUX is part of that lazy MSR
set, so after writing the guest value and before the lazy restore any
exception using the paranoid entry will read the guest value and use
it as CPU number to retrieve the GSBASE value for the current CPU when
FSGSBASE is enabled. As RDPID is only used in that particular entry
path, there is no reason to burden VMENTER/EXIT with two extra MSR
writes. Remove the RDPID optimization, which is not even backed by
numbers from the paranoid entry path instead"
* tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
Linus Torvalds [Sun, 23 Aug 2020 18:15:14 +0000 (11:15 -0700)]
Merge tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fix from Thomas Gleixner:
"A single update for perf on x86 which has support for the broken down
bandwith counters"
* tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
Linus Torvalds [Sun, 23 Aug 2020 18:05:47 +0000 (11:05 -0700)]
Merge tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull entry fix from Thomas Gleixner:
"A single bug fix for the common entry code.
The transcription of the x86 version messed up the reload of the
syscall number from pt_regs after ptrace and seccomp which breaks
syscall number rewriting"
* tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
core/entry: Respect syscall number rewrites
Linus Torvalds [Sun, 23 Aug 2020 17:57:19 +0000 (10:57 -0700)]
Merge tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"A single fix correcting a reversed error severity determination check
which lead to a recoverable error getting marked as fatal, by Tony
Luck"
* tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
Pull networking fixes from David Miller:
"Nothing earth shattering here, lots of small fixes (f.e. missing RCU
protection, bad ref counting, missing memset(), etc.) all over the
place:
1) Use get_file_rcu() in task_file iterator, from Yonghong Song.
2) There are two ways to set remote source MAC addresses in macvlan
driver, but only one of which validates things properly. Fix this.
From Alvin Šipraga.
3) Missing of_node_put() in gianfar probing, from Sumera
Priyadarsini.
4) Preserve device wanted feature bits across multiple netlink
ethtool requests, from Maxim Mikityanskiy.
5) Fix rcu_sched stall in task and task_file bpf iterators, from
Yonghong Song.
6) Avoid reset after device destroy in ena driver, from Shay
Agroskin.
7) Missing memset() in netlink policy export reallocation path, from
Johannes Berg.
8) Fix info leak in __smc_diag_dump(), from Peilin Ye.
9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
Tomlinson.
10) Fix number of data stream negotiation in SCTP, from David Laight.
11) Fix double free in connection tracker action module, from Alaa
Hleihel.
12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
net: nexthop: don't allow empty NHA_GROUP
bpf: Fix two typos in uapi/linux/bpf.h
net: dsa: b53: check for timeout
tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
net: sctp: Fix negotiation of the number of data streams.
dt-bindings: net: renesas, ether: Improve schema validation
gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
hv_netvsc: Remove "unlikely" from netvsc_select_queue
bpf: selftests: global_funcs: Check err_str before strstr
bpf: xdp: Fix XDP mode when no mode flags specified
selftests/bpf: Remove test_align leftovers
tools/resolve_btfids: Fix sections with wrong alignment
net/smc: Prevent kernel-infoleak in __smc_diag_dump()
sfc: fix build warnings on 32-bit
net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
libbpf: Fix map index used in error message
net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
net: atlantic: Use readx_poll_timeout() for large timeout
...
Linus Torvalds [Sun, 23 Aug 2020 00:11:38 +0000 (17:11 -0700)]
Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
"Fix reference counting and clean up exit paths"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_epoll_ctl(): clean the failure exits up a bit
epoll: Keep a reference on files added to the check list
Marc Zyngier [Wed, 19 Aug 2020 16:12:17 +0000 (17:12 +0100)]
epoll: Keep a reference on files added to the check list
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.
However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.
Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
====================
l2tp: replace custom logging code with tracepoints
The l2tp subsystem implemented custom logging macros for debugging
purposes which were controlled using a set of debugging flags in each
tunnel and session structure.
A more standard and easier-to-use approach is to use tracepoints.
This patchset refactors l2tp to:
* remove excessive logging
* tweak useful log messages to use the standard pr_* calls for logging
rather than the l2tp wrappers
* replace debug-level logging with tracepoints
* add tracepoints for capturing tunnel and session lifetime events
I note that checkpatch.pl warns about the layout of code in the
newly-added file net/l2tp/trace.h. When adding this file I followed the
example(s) of other tracepoint files in the net/ subtree since it seemed
preferable to adhere to the prevailing style rather than follow
checkpatch.pl's advice in this instance. If that's the wrong
approach please let me know.
v1 -> v2
* Fix up a build warning found by the kernel test robot
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin [Sat, 22 Aug 2020 14:59:08 +0000 (15:59 +0100)]
l2tp: remove tunnel and session debug flags field
The l2tp subsystem now uses standard kernel logging APIs for
informational and warning messages, and tracepoints for debug
information.
Now that the tunnel and session debug flags are unused, remove the field
from the core structures.
Various system calls (in the case of l2tp_ppp) and netlink messages
handle the getting and setting of debug flags. To avoid userspace
breakage don't modify the API of these calls; simply ignore set
requests, and send dummy data for get requests.
Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin [Sat, 22 Aug 2020 14:59:04 +0000 (15:59 +0100)]
l2tp: add tracepoint infrastructure to core
The l2tp subsystem doesn't currently make use of tracepoints.
As a starting point for adding tracepoints, add skeleton infrastructure
for defining tracepoints for the subsystem, and for having them build
appropriately whether compiled into the kernel or built as a module.
Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin [Sat, 22 Aug 2020 14:59:03 +0000 (15:59 +0100)]
l2tp: use standard API for warning log messages
The l2tp_* log wrappers only emit messages of a given category if the
tunnel or session structure has the appropriate flag set in its debug
field. Flags default to being unset.
For warning messages, this doesn't make a lot of sense since an
administrator is likely to want to know about datapath warnings without
needing to tweak the debug flags setting for a given tunnel or session
instance.
Modify l2tp_warn callsites to use pr_warn_ratelimited instead for
unconditional output of warning messages.
Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin [Sat, 22 Aug 2020 14:59:01 +0000 (15:59 +0100)]
l2tp: don't log data frames
l2tp had logging to trace data frame receipt and transmission, including
code to dump packet contents. This was originally intended to aid
debugging of core l2tp packet handling, but is of limited use now that
code is stable.
Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the nexthop code will use an empty NHA_GROUP attribute, but it
requires at least 1 entry in order to function properly. Otherwise we
end up derefencing null or random pointers all over the place due to not
having any nh_grp_entry members allocated, nexthop code relies on having at
least the first member present. Empty NHA_GROUP doesn't make any sense so
just disallow it.
Also add a WARN_ON for any future users of nexthop_create_group().
CC: David Ahern <dsahern@gmail.com> Fixes: 6e9d1ef54f91 ("nexthop: Add support for nexthop groups") Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>