Jia-Ju Bai [Sat, 30 May 2020 02:41:50 +0000 (10:41 +0800)]
net: vmxnet3: fix possible buffer overflow caused by bad DMA value in vmxnet3_get_rss()
The value adapter->rss_conf is stored in DMA memory, and it is assigned
to rssConf, so rssConf->indTableSize can be modified at anytime by
malicious hardware. Because rssConf->indTableSize is assigned to n,
buffer overflow may occur when the code "rssConf->indTable[n]" is
executed.
To fix this possible bug, n is checked after being used.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 29 May 2020 20:13:58 +0000 (22:13 +0200)]
flow_dissector: work around stack frame size warning
The fl_flow_key structure is around 500 bytes, so having two of them
on the stack in one function now exceeds the warning limit after an
otherwise correct change:
net/sched/cls_flower.c:298:12: error: stack frame size of 1056 bytes in function 'fl_classify' [-Werror,-Wframe-larger-than=]
I suspect the fl_classify function could be reworked to only have one
of them on the stack and modify it in place, but I could not work out
how to do that.
As a somewhat hacky workaround, move one of them into an out-of-line
function to reduce its scope. This does not necessarily reduce the stack
usage of the outer function, but at least the second copy is removed
from the stack during most of it and does not add up to whatever is
called from there.
I now see 552 bytes of stack usage for fl_classify(), plus 528 bytes
for fl_mask_lookup().
Fixes: d553afd133ac ("flow_dissector: Parse multiple MPLS Label Stack Entries") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roelof Berg [Fri, 29 May 2020 19:30:02 +0000 (21:30 +0200)]
lan743x: Added fixed link and RGMII support
Microchip lan7431 is frequently connected to a phy. However, it
can also be directly connected to a MII remote peer without
any phy in between. For supporting such a phyless hardware setup
in Linux we utilized phylib, which supports a fixed-link
configuration via the device tree. And we added support for
defining the connection type R/GMII in the device tree.
New behavior:
-------------
. The automatic speed and duplex detection of the lan743x silicon
between mac and phy is disabled. Instead phylib is used like in
other typical Linux drivers. The usage of phylib allows to
specify fixed-link parameters in the device tree.
. The device tree entry phy-connection-type is supported now with
the modes RGMII or (G)MII (default).
Development state:
------------------
. Tested with fixed-phy configurations. Not yet tested in normal
configurations with phy. Microchip kindly offered testing
as soon as the Corona measures allow this.
====================
devlink: Add support for control packet traps
So far device drivers were only able to register drop and exception
packet traps with devlink. These traps are used for packets that were
either dropped by the underlying device or encountered an exception
(e.g., missing neighbour entry) during forwarding.
However, in the steady state, the majority of the packets being trapped
to the CPU are packets that are required for the correct functioning of
the control plane. For example, ARP request and IGMP query packets.
This patch set allows device drivers to register such control traps with
devlink and expose their default control plane policy to user space.
User space can then tune the packet trap policer settings according to
its needs, as with existing packet traps.
In a similar fashion to exception traps, the action associated with such
traps cannot be changed as it can easily break the control plane. Unlike
drop and exception traps, packets trapped via control traps are not
reported to the kernel's drop monitor as they are not indicative of any
problem.
Patch set overview:
Patches #1-#3 break out layer 3 exceptions to a different group to
provide better granularity. A future patch set will make this completely
configurable.
Patch #4 adds a new trap action ('mirror') that is used for packets that
are forwarded by the device and sent to the CPU. Such packets are marked
by device drivers with 'skb->offload_fwd_mark = 1' in order to prevent
the kernel from forwarding them again.
Patch #5 adds the new trap type, 'control'.
Patches #6-#8 gradually add various control traps to devlink with proper
documentation.
Patch #9 adds a few control traps to netdevsim, which are automatically
exercised by existing devlink-trap selftest.
Patches #10 performs small refactoring in mlxsw.
Patches #11-#13 change mlxsw to register its existing control traps with
devlink.
Patch #14 adds a selftest over mlxsw that exercises all the registered
control traps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:45 +0000 (21:36 +0300)]
mlxsw: spectrum_trap: Factor out common Rx listener function
We currently have an Rx listener function for exception traps that marks
received skbs with 'offload_fwd_mark' and injects them to the kernel's
Rx path. The marking is done because all these exceptions occur during
L3 forwarding, after the packets were potentially flooded at L2.
A subsequent patch will add support for control traps. Packets received
via some of these control traps need different handling:
1. Packets might not need to be marked with 'offload_fwd_mark'. For
example, if packet was trapped before L2 forwarding
2. Packets might not need to be injected to the kernel's Rx path. For
example, sampled packets are reported to user space via the psample
module
Factor out a common Rx listener function that only reports trapped
packets to devlink. Call it from mlxsw_sp_rx_no_mark_listener() and
mlxsw_sp_rx_mark_listener() that will inject the packets to the kernel's
Rx path, without and with the marking, respectively.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:44 +0000 (21:36 +0300)]
netdevsim: Register control traps
Register two control traps with devlink. The existing selftest at
tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
over all registered traps and checks that the action of non-drop traps
cannot be changed. Up until now only exception traps were tested, now
control traps will be tested as well.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:43 +0000 (21:36 +0300)]
devlink: Add ACL control packet traps
Add packet traps for packets that are sampled / trapped by ACLs, so that
capable drivers could register them with devlink. Add documentation for
every added packet trap and packet trap group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:42 +0000 (21:36 +0300)]
devlink: Add layer 3 control packet traps
Add layer 3 control packet traps such as ARP and DHCP, so that capable
device drivers could register them with devlink. Add documentation for
every added packet trap and packet trap group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:41 +0000 (21:36 +0300)]
devlink: Add layer 2 control packet traps
Add layer 2 control packet traps such as STP and IGMP query, so that
capable device drivers could register them with devlink. Add
documentation for every added packet trap and packet trap group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:39 +0000 (21:36 +0300)]
devlink: Add 'mirror' trap action
The action is used by control traps such as IGMP query. The packet is
flooded by the device, but also trapped to the CPU in order for the
software bridge to mark the receiving port as a multicast router port.
Such packets are marked with 'skb->offload_fwd_mark = 1' in order to
prevent the software bridge from flooding them again.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 29 May 2020 18:36:36 +0000 (21:36 +0300)]
devlink: Create dedicated trap group for layer 3 exceptions
Packets that hit exceptions during layer 3 forwarding must be trapped to
the CPU for the control plane to function properly. Create a dedicated
group for them, so that user space could choose to assign a different
policer for them.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The removal of mips_swiotlb_ops exposed a problem in octeon_mgmt Ethernet
driver. mips_swiotlb_ops had an mb() after most of the operations and the
removal of the ops had broken the receive functionality of the driver.
My code inspection has shown no other places except
octeon_mgmt_rx_fill_ring() where an explicit barrier would be obviously
missing. The latter function however has to make sure that "ringing the
bell" doesn't happen before RX ring entry is really written.
The patch has been successfully tested on Octeon II.
Fixes: 7f4d1828f8cb ("MIPS: remove mips_swiotlb_ops") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
the indirect flow_block infrastructure, revisited
This series fixes 033dabdd31de ("netfilter: flowtable: add indr block
setup support") that adds support for the indirect block for the
flowtable. This patch crashes the kernel with the TC CT action.
A workaround patch to cure this crash has been proposed. However, there
is another problem: The indirect flow_block still does not work for the
new TC CT action. The problem is that the existing flow_indr_block_entry
callback assumes you can look up for the flowtable from the netdevice to
get the flow_block. This flow_block allows you to offload the flows via
TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the
flow_block from the TC CT flowtables because they are _not_ bound to any
specific netdevice.
= What is the indirect flow_block infrastructure?
The indirect flow_block infrastructure allows drivers to offload
tc/netfilter rules that belong to software tunnel netdevices, e.g.
vxlan.
This indirect flow_block infrastructure relates tunnel netdevices with
drivers because there is no obvious way to relate these two things
from the control plane.
= How does the indirect flow_block work before this patchset?
Front-ends register the indirect block callback through
flow_indr_add_block_cb() if they support for offloading tunnel
netdevices.
== Setting up an indirect block
1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events.
If there is a new tunnel netdevice that the driver can offload, then the
driver invokes __flow_indr_block_cb_register() with the new tunnel
netdevice and the driver callback. The __flow_indr_block_cb_register()
call iterates over the list of the front-end callbacks.
2) The front-end callback sets up the flow_block_offload structure and it
invokes the driver callback to set up the flow_block.
3) The driver callback now registers the flow_block structure and it
returns the flow_block back to the front-end.
4) The front-end gets the flow_block object and it is now ready to
offload rules for this tunnel netdevice.
There are two possibilities, either tunnel netdevice is removed or
a netdevice (port representor) is removed.
=== Tunnel netdevice is removed
Driver waits for the NETDEV_UNREGISTER event that announces the tunnel
netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to
remove the flow_block and rules. Callgraph is very similar to the one
described above.
=== Netdevice is removed (port representor)
Driver calls __flow_indr_block_cb_unregister() to remove the existing
netfilter/tc rule that belong to the tunnel netdevice.
= How does the indirect flow_block work after this patchset?
Drivers register the indirect flow_block setup callback through
flow_indr_dev_register() if they support for offloading tunnel
netdevices.
== Setting up an indirect flow_block
1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so,
frontends call flow_indr_dev_setup_offload(). This call invokes
the drivers' indirect flow_block setup callback.
2) The indirect flow_block setup callback sets up a flow_block structure
which relates the tunnel netdevice and the driver.
3) The front-end uses flow_block and offload the rules.
Note that the operational to set up (non-indirect) flow_block is very
similar.
== Releasing the indirect flow_block
=== Tunnel netdevice is removed
This calls flow_indr_dev_setup_offload() to set down the flow_block and
remove the offloaded rules. This alternate path is exercised if
dev->netdev_ops->ndo_setup_tc is unset.
=== Netdevice is removed (port representor)
If a netdevice is removed, then it might need to to clean up the
offloaded tc/netfilter rules that belongs to the tunnel netdevice:
1) The driver invokes flow_indr_dev_unregister() when a netdevice is
removed.
2) This call iterates over the existing indirect flow_blocks
and it invokes the cleanup callback to let the front-end remove the
tc/netfilter rules. The cleanup callback already provides the
flow_block that the front-end needs to clean up.
Front-end Driver
|
flow_indr_dev_unregister(...)
|
iterate over list of indirect flow_block
and invoke cleanup callback
|
.-----------------------------
|
.
frontend_flow_block_cleanup(flow_block)
.
|
\/
remove rules to flow_block
TC_SETUP_CLSFLOWER
= About this patchset
This patchset aims to address the existing TC CT problem while
simplifying the indirect flow_block infrastructure. Saving 300 LoC in
the flow_offload core and the drivers. The operational gets aligned with
the (non-indirect) flow_blocks logic. Patchset is composed of:
Patch #1 add nf_flow_table_gc_cleanup() which is required by the
netfilter's flowtable new indirect flow_block approach.
Patch #2 adds the flow_block_indr object which is actually part of
of the flow_block object. This stores the indirect flow_block
metadata such as the tunnel netdevice owner and the cleanup
callback (in case the tunnel netdevice goes away).
This patch adds flow_indr_dev_{un}register() to allow drivers
to offer netdevice tunnel hardware offload to the front-ends.
Then, front-ends call flow_indr_dev_setup_offload() to invoke
the drivers to set up the (indirect) flow_block.
Patch #3 add the tcf_block_offload_init() helper function, this is
a preparation patch to adapt the tc front-end to use this
new indirect flow_block infrastructure.
Patch #4 updates the tc and netfilter front-ends to use the new
indirect flow_block infrastructure.
Patch #5 updates the mlx5 driver to use the new indirect flow_block
infrastructure.
Patch #6 updates the nfp driver to use the new indirect flow_block
infrastructure.
Patch #7 updates the bnxt driver to use the new indirect flow_block
infrastructure.
Patch #8 removes the indirect flow_block infrastructure version 1,
now that frontends and drivers have been translated to
version 2 (coming in this patchset).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers do not register to netdev events to set up indirect blocks
anymore. Remove __flow_indr_block_cb_register() and
__flow_indr_block_cb_unregister().
The frontends set up the callbacks through flow_indr_dev_setup_block()
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Register ndo callback via flow_indr_dev_register() and
flow_indr_dev_unregister().
No need for mlx5e_rep_indr_clean_block_privs() since flow_block_cb_free()
already releases the internal mapping via ->release callback, which in
this case is mlx5e_rep_indr_tc_block_unbind().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Update existing frontends to use flow_indr_dev_setup_offload().
This new function must be called if ->ndo_setup_tc is unset to deal
with tunnel devices.
If there is no driver that is subscribed to new tunnel device
flow_block bindings, then this function bails out with EOPNOTSUPP.
If the driver module is removed, the ->cleanup() callback removes the
entries that belong to this tunnel device. This cleanup procedures is
triggered when the device unregisters the tunnel device offload handler.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tunnel devices provide no dev->netdev_ops->ndo_setup_tc(...) interface.
The tunnel device and route control plane does not provide an obvious
way to relate tunnel and physical devices.
This patch allows drivers to register a tunnel device offload handler
for the tc and netfilter frontends through flow_indr_dev_register() and
flow_indr_dev_unregister().
The frontend calls flow_indr_dev_setup_offload() that iterates over the
list of drivers that are offering tunnel device hardware offload
support and it sets up the flow block for this tunnel device.
If the driver module is removed, the indirect flow_block ends up with a
stale callback reference. The module removal path triggers the
dev_shutdown() path to remove the qdisc and the flow_blocks for the
physical devices. However, this is not useful for tunnel devices, where
relation between the physical and the tunnel device is not explicit.
This patch introduces a cleanup callback that is invoked when the driver
module is removed to clean up the tunnel device flow_block. This patch
defines struct flow_block_indr and it uses it from flow_block_cb to
store the information that front-end requires to perform the
flow_block_cb cleanup on module removal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 28 May 2020 22:05:32 +0000 (00:05 +0200)]
net/sched: fix a couple of splats in the error path of tfc_gate_init()
trying to configure TC 'act_gate' rules with invalid control actions, the
following splat can be observed:
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 PID: 2143 Comm: tc Not tainted 5.7.0-rc6+ #168
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: 0010:hrtimer_active+0x56/0x290
[...]
Call Trace:
hrtimer_try_to_cancel+0x6d/0x330
hrtimer_cancel+0x11/0x20
tcf_gate_cleanup+0x15/0x30 [act_gate]
tcf_action_cleanup+0x58/0x170
__tcf_action_put+0xb0/0xe0
__tcf_idr_release+0x68/0x90
tcf_gate_init+0x7c7/0x19a0 [act_gate]
tcf_action_init_1+0x60f/0x960
tcf_action_init+0x157/0x2a0
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x2a3/0x39d
rtnetlink_rcv_msg+0x5f3/0x920
netlink_rcv_skb+0x121/0x350
netlink_unicast+0x439/0x630
netlink_sendmsg+0x714/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5b4/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x9a/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
this is caused by hrtimer_cancel(), running before hrtimer_init(). Fix it
ensuring to call hrtimer_cancel() only if clockid is valid, and the timer
has been initialized. After fixing this splat, the same error path causes
another problem:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 980 Comm: tc Not tainted 5.7.0-rc6+ #168
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: 0010:release_entry_list+0x4a/0x240 [act_gate]
[...]
Call Trace:
tcf_action_cleanup+0x58/0x170
__tcf_action_put+0xb0/0xe0
__tcf_idr_release+0x68/0x90
tcf_gate_init+0x7ab/0x19a0 [act_gate]
tcf_action_init_1+0x60f/0x960
tcf_action_init+0x157/0x2a0
tcf_action_add+0xd9/0x2f0
tc_ctl_action+0x2a3/0x39d
rtnetlink_rcv_msg+0x5f3/0x920
netlink_rcv_skb+0x121/0x350
netlink_unicast+0x439/0x630
netlink_sendmsg+0x714/0xbf0
sock_sendmsg+0xe2/0x110
____sys_sendmsg+0x5b4/0x890
___sys_sendmsg+0xe9/0x160
__sys_sendmsg+0xd3/0x170
do_syscall_64+0x9a/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
the problem is similar: tcf_action_cleanup() was trying to release a list
without initializing it first. Ensure that INIT_LIST_HEAD() is called for
every newly created 'act_gate' action, same as what was done to 'act_ife'
with commit 2508649596dc ("net/sched: act_ife: initalize ife->metalist
earlier").
Fixes: 0eb9d0588e39 ("net: qos: introduce a gate control flow action") CC: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 1 Jun 2020 18:35:18 +0000 (11:35 -0700)]
Merge branch 'regmap-simple-bit-helpers'
Bartosz Golaszewski says:
====================
regmap: provide simple bitops and use them in a driver
I noticed that oftentimes I use regmap_update_bits() for simple bit
setting or clearing. In this case the fourth argument is superfluous as
it's always 0 or equal to the mask argument.
This series proposes to add simple bit operations for setting, clearing
and testing specific bits with regmap.
The second patch uses all three in a driver that got recently picked into
the net-next tree.
The patches obviously target different trees so - if you're ok with
the change itself - I propose you pick the first one into your regmap
tree for v5.8 and then I'll resend the second patch to add the first
user for these macros for v5.9.
v1 -> v2:
- convert the new macros to static inline functions
v2 -> v3:
- drop unneeded ternary operator
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In many instances regmap_update_bits() is used for simple bit setting
and clearing. In these cases the last argument is redundant and we can
hide it with a static inline function.
This adds three new helpers for simple bit operations: set_bits,
clear_bits and test_bits (the last one defined as a regular function).
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 28 May 2020 12:49:57 +0000 (15:49 +0300)]
cxgb4: cleanup error code in setup_sge_queues_uld()
The caller doesn't care about the error codes, they only check for zero
vs non-zero. Still, it's better to preserve the negative error codes
from alloc_uld_rxqs() instead of changing it to 1. We can also return
directly if there is a failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Fix infinite loop in bridge and vxlan modules
When suppressing invalid IPv6 Neighbour Solicitation messages, it is
possible for the bridge and vxlan modules to get stuck in an infinite
loop. See the individual changelogs for detailed explanation of the
problem and solution.
The bug was originally reported against the bridge module, but after
auditing the code base I found that the buggy code was copied from the
vxlan module. This patch set fixes both modules. Could not find more
instances of the problem.
Please consider both patches for stable releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 1 Jun 2020 12:58:55 +0000 (15:58 +0300)]
vxlan: Avoid infinite loop when suppressing NS messages with invalid options
When proxy mode is enabled the vxlan device might reply to Neighbor
Solicitation (NS) messages on behalf of remote hosts.
In case the NS message includes the "Source link-layer address" option
[1], the vxlan device will use the specified address as the link-layer
destination address in its reply.
To avoid an infinite loop, break out of the options parsing loop when
encountering an option with length zero and disregard the NS message.
This is consistent with the IPv6 ndisc code and RFC 4886 which states
that "Nodes MUST silently discard an ND packet that contains an option
with length zero" [2].
Fixes: 4f645f2cc883 ("vxlan: fix nonfunctional neigh_reduce()") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 1 Jun 2020 12:58:54 +0000 (15:58 +0300)]
bridge: Avoid infinite loop when suppressing NS messages with invalid options
When neighbor suppression is enabled the bridge device might reply to
Neighbor Solicitation (NS) messages on behalf of remote hosts.
In case the NS message includes the "Source link-layer address" option
[1], the bridge device will use the specified address as the link-layer
destination address in its reply.
To avoid an infinite loop, break out of the options parsing loop when
encountering an option with length zero and disregard the NS message.
This is consistent with the IPv6 ndisc code and RFC 4886 which states
that "Nodes MUST silently discard an ND packet that contains an option
with length zero" [2].
Fixes: 058bb1acc6dc ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alla Segal <allas@mellanox.com> Tested-by: Alla Segal <allas@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: nexthop: Fix deadcode issue by performing a proper NULL check
After allocating the spare nexthop group it should be tested for kzalloc()
returning NULL, instead the already used nexthop group (which cannot be
NULL at this point) had been tested so far.
Additionally, if kzalloc() fails, return ERR_PTR(-ENOMEM) instead of NULL.
Coverity-id: 1463885 Reported-by: Coverity <scan-admin@coverity.com> Signed-off-by: Patrick Eigensatz <patrickeigensatz@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Here's one last bluetooth-next pull request for 5.8, which I hope can
still be accepted.
- Enabled Wide-Band Speech (WBS) support for Qualcomm wcn3991
- Multiple fixes/imprvovements to Qualcomm-based devices
- Fix GAP/SEC/SEM/BI-10-C qualfication test case
- Added support for Broadcom BCM4350C5 device
- Several other smaller fixes & improvements
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zijun Hu [Fri, 29 May 2020 14:38:31 +0000 (22:38 +0800)]
Bluetooth: hci_qca: Fix QCA6390 memdump failure
QCA6390 memdump VSE sometimes come to bluetooth driver
with wrong sequence number as illustrated as follows:
frame # in dec: frame data in hex
1396: ff fd 01 08 74 05 00 37 8f 14
1397: ff fd 01 08 75 05 00 ff bf 38
1414: ff fd 01 08 86 05 00 fb 5e 4b
1399: ff fd 01 08 77 05 00 f3 44 0a
1400: ff fd 01 08 78 05 00 ca f7 41
it is mistook for controller missing packets, so results
in page fault after overwriting memdump buffer allocated.
Fixed by ignoring QCA6390 sequence number check and
checking buffer space before writing.
Signed-off-by: Zijun Hu <zijuhu@codeaurora.org> Tested-by: Zijun Hu <zijuhu@codeaurora.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Zijun Hu [Fri, 29 May 2020 15:58:56 +0000 (23:58 +0800)]
Bluetooth: btmtkuart: Use serdev_device_write_buf() instead of serdev_device_write()
serdev_device_write() is not appropriate at here because
serdev_device_write_wakeup() is not used to release completion hold
by the former at @write_wakeup member of struct serdev_device_ops.
Fix by using serdev_device_write_buf() instead of serdev_device_write().
Signed-off-by: Zijun Hu <zijuhu@codeaurora.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2020 21:32:50 +0000 (14:32 -0700)]
Merge tag 'mac80211-next-for-davem-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Another set of changes, including
* many 6 GHz changes, though it's not _quite_ complete
(I left out scanning for now, we're still discussing)
* allow userspace SA-query processing for operating channel
validation
* TX status for control port TX, for AP-side operation
* more per-STA/TID control options
* move to kHz for channels, for future S1G operation
* various other small changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yes, staying withing 80 columns is certainly still _preferred_. But
it's not the hard limit that the checkpatch warnings imply, and other
concerns can most certainly dominate.
Increase the default limit to 100 characters. Not because 100
characters is some hard limit either, but that's certainly a "what are
you doing" kind of value and less likely to be about the occasional
slightly longer lines.
Miscellanea:
- to avoid unnecessary whitespace changes in files, checkpatch will no
longer emit a warning about line length when scanning files unless
--strict is also used
- Add a bit to coding-style about alignment to open parenthesis
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 31 May 2020 17:45:11 +0000 (10:45 -0700)]
Merge tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A pile of x86 fixes:
- Prevent a memory leak in ioperm which was caused by the stupid
assumption that the exit cleanup is always called for current,
which is not the case when fork fails after taking a reference on
the ioperm bitmap.
- Fix an arithmething overflow in the DMA code on 32bit systems
- Fill gaps in the xstate copy with defaults instead of leaving them
uninitialized
- Revert: "Make __X32_SYSCALL_BIT be unsigned long" as it turned out
that existing user space fails to build"
* tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioperm: Prevent a memory leak when fork fails
x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
copy_xstate_to_kernel(): don't leave parts of destination uninitialized
x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
14) Fix leak in inetdev_init(), from Yang Yingliang.
15) Don't try to use inet hash and unhash in l2tp code, results in
crashes. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
l2tp: add sk_family checks to l2tp_validate_socket
l2tp: do not use inet_hash()/inet_unhash()
net: qrtr: Allocate workqueue before kernel_bind
mptcp: remove msk from the token container at destruction time.
mptcp: fix race between MP_JOIN and close
mptcp: fix unblocking connect()
net/sched: act_ct: add nat mangle action only for NAT-conntrack
devinet: fix memleak in inetdev_init()
virtio_vsock: Fix race condition in virtio_transport_recv_pkt
drivers/net/ibmvnic: Update VNIC protocol version reporting
NFC: st21nfca: add missed kfree_skb() in an error path
neigh: fix ARP retransmit timer guard
bpf, selftests: Add a verifier test for assigning 32bit reg states to 64bit ones
bpf, selftests: Verifier bounds tests need to be updated
bpf: Fix a verifier issue when assigning 32bit reg states to 64bit ones
bpf: Fix use-after-free in fmod_ret check
net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta()
net/mlx5e: Fix MLX5_TC_CT dependencies
net/mlx5e: Properly set default values when disabling adaptive moderation
net/mlx5e: Fix arch depending casting issue in FEC
...
Nathan Errera [Thu, 28 May 2020 19:22:38 +0000 (21:22 +0200)]
cfg80211: support bigger kek/kck key length
With some newer AKMs, the KCK and KEK are bigger, so allow that
if the driver advertises support for it. In addition, add a new
attribute for the AKM so we can use it for offloaded rekeying.
Johannes Berg [Thu, 28 May 2020 19:34:41 +0000 (21:34 +0200)]
cfg80211: reject HT/VHT capabilities on 6 GHz band
On the 6 GHz band, HE should be used, but without any direct HT/VHT
capabilities, instead the HE 6 GHz band capabilities will capture
the relevant information. Reject HT/VHT capabilities here.
Johannes Berg [Thu, 28 May 2020 19:34:40 +0000 (21:34 +0200)]
cfg80211: treat 6 GHz channels as valid regardless of capability
If a 6 GHz channel exists, then we can probably safely assume that
the device actually supports it, and then it should support most
bandwidths.
This will probably need to be extended to check the interface type
and then dig into the HE capabilities for that though, to have the
correct bandwidth check.
Johannes Berg [Thu, 28 May 2020 19:34:38 +0000 (21:34 +0200)]
mac80211: use HE 6 GHz band capability and pass it to the driver
In order to handle 6 GHz AP side, take the HE 6 GHz band capability
data and pass it to the driver (which needs it for A-MPDU spacing
and A-MPDU length).
Shaul Triebitz [Thu, 28 May 2020 19:34:37 +0000 (21:34 +0200)]
mac80211: check the correct bit for EMA AP
An AP supporting EMA (Enhanced Multi-BSSID advertisement) should set
bit 83 in the extended capabilities IE (9.4.2.26 in the 802.11ax D5 spec).
So the *3rd* bit of the 10th byte should be checked.
Also, in one place, the wrong byte was checked.
(cfg80211_find_ie returns a pointer to the beginning of the IE,
so the data really starts at ie[2], so the 10th byte
should be ie[12]. To avoid this confusion, use cfg80211_find_elem
instead).
Johannes Berg [Thu, 28 May 2020 19:34:36 +0000 (21:34 +0200)]
mac80211: determine chandef from HE 6 GHz operation
Support connecting to HE 6 GHz APs and mesh networks on 6 GHz,
where the HT/VHT information is missing but instead the HE 6 GHz
band capability is present, and the 6 GHz Operation information
field is used to encode the channel configuration instead of the
HT/VHT operation elements.
Also add some other bits needed to connect to 6 GHz networks.
Johannes Berg [Thu, 28 May 2020 19:34:35 +0000 (21:34 +0200)]
mac80211: avoid using ext NSS high BW if not supported
If the AP advertises inconsistent data, namely it has CCFS1 or CCFS2,
but doesn't advertise support for 160/80+80 bandwidth or "Extended NSS
BW Support", then we cannot use any MCSes in the the higher bandwidth.
Thus, avoid connecting with higher bandwidth since it's less efficient
that way.
mac80211: build HE operation with 6 GHz oper information
Add 6 GHz operation information (IEEE 802.11ax/D6.0, Figure 9-787k)
while building HE operation element for non-HE AP. This field is used to
determine channel information in the absence of HT/VHT IEs.
Construct HE 6 GHz band capability element (IEEE 802.11ax/D6.0,
9.4.2.261) for association request and mesh beacon. The 6 GHz
capability information is passed by driver through iftypes caps.
Johannes Berg [Thu, 28 May 2020 19:34:31 +0000 (21:34 +0200)]
cfg80211: add and expose HE 6 GHz band capabilities
These capabilities cover what would otherwise be transported
in HT/VHT capabilities, but only a subset thereof that is
actually needed on 6 GHz with HE already present. Expose the
capabilities to userspace, drivers are expected to set them
as using the 6 GHz band (currently) requires HE capability.
Arend Van Spriel [Fri, 29 May 2020 09:41:43 +0000 (11:41 +0200)]
cfg80211: adapt to new channelization of the 6GHz band
The 6GHz band does not have regulatory approval yet, but things are
moving forward. However, that has led to a change in the channelization
of the 6GHz band which has been accepted in the 11ax specification. It
also fixes a missing MHZ_TO_KHZ() macro for 6GHz channels while at it.
This change is primarily thrown in to discuss how to deal with it.
I noticed ath11k adding 6G support with old channelization and ditto
for iw. It probably involves changes in hostapd as well.
Johannes Berg [Fri, 29 May 2020 12:04:27 +0000 (14:04 +0200)]
cfg80211: fix 6 GHz frequencies to kHz
The updates to change to kHz frequencies and the 6 GHz
additions evidently overlapped (or rather, I didn't see
it when applying the latter), so the 6 GHz is broken.
Fix this.
Eric Dumazet [Fri, 29 May 2020 18:32:25 +0000 (11:32 -0700)]
l2tp: add sk_family checks to l2tp_validate_socket
syzbot was able to trigger a crash after using an ISDN socket
and fool l2tp.
Fix this by making sure the UDP socket is of the proper family.
BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78
Write of size 1 at addr ffff88808ed0c590 by task syz-executor.5/3018
Memory state around the buggy address: ffff88808ed0c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88808ed0c500: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88808ed0c580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88808ed0c600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88808ed0c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 228539feba0a ("l2tp: fix races in tunnel creation") Fixes: b2079113577e ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Guillaume Nault <gnault@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 29 May 2020 18:20:53 +0000 (11:20 -0700)]
l2tp: do not use inet_hash()/inet_unhash()
syzbot recently found a way to crash the kernel [1]
Issue here is that inet_hash() & inet_unhash() are currently
only meant to be used by TCP & DCCP, since only these protocols
provide the needed hashinfo pointer.
L2TP uses a single list (instead of a hash table)
This old bug became an issue after commit cf0204725308
("bpf: Add new cgroup attach type to enable sock modifications")
since after this commit, sk_common_release() can be called
while the L2TP socket is still considered 'hashed'.
Paolo Abeni [Fri, 29 May 2020 15:49:18 +0000 (17:49 +0200)]
mptcp: fix NULL ptr dereference in MP_JOIN error path
When token lookup on MP_JOIN 3rd ack fails, the server
socket closes with a reset the incoming child. Such socket
has the 'is_mptcp' flag set, but no msk socket associated
- due to the failed lookup.
While crafting the reset packet mptcp_established_options_mp()
will try to dereference the child's master socket, causing
a NULL ptr dereference.
This change addresses the issue with explicit fallback to
TCP in such error path.
Fixes: 74f69891c81d ("mptcp: cope better with MP_JOIN failure") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_cake: Take advantage of skb->hash where appropriate
While the other fq-based qdiscs take advantage of skb->hash and doesn't
recompute it if it is already set, sch_cake does not.
This was a deliberate choice because sch_cake hashes various parts of the
packet header to support its advanced flow isolation modes. However,
foregoing the use of skb->hash entirely loses a few important benefits:
- When skb->hash is set by hardware, a few CPU cycles can be saved by not
hashing again in software.
- Tunnel encapsulations will generally preserve the value of skb->hash from
before the encapsulation, which allows flow-based qdiscs to distinguish
between flows even though the outer packet header no longer has flow
information.
It turns out that we can preserve these desirable properties in many cases,
while still supporting the advanced flow isolation properties of sch_cake.
This patch does so by reusing the skb->hash value as the flow_hash part of
the hashing procedure in cake_hash() only in the following conditions:
- If the skb->hash is marked as covering the flow headers (skb->l4_hash is
set)
AND
- NAT header rewriting is either disabled, or did not change any values
used for hashing. The latter is important to match local-origin packets
such as those of a tunnel endpoint.
The immediate motivation for fixing this was the recent patch to WireGuard
to preserve the skb->hash on encapsulation. As such, this is also what I
tested against; with this patch, added latency under load for competing
flows drops from ~8 ms to sub-1ms on an RRUL test over a WireGuard tunnel
going through a virtual link shaped to 1Gbps using sch_cake. This matches
the results we saw with a similar setup using sch_fq_codel when testing the
WireGuard patch.
Fixes: 106d5dd474da ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ravb: Mask PHY mode to avoid inserting delays twice
Until recently, the Micrel KSZ9031 PHY driver ignored any PHY mode
("RGMII-*ID") settings, but used the hardware defaults, augmented by
explicit configuration of individual skew values using the "*-skew-ps"
DT properties. The lack of PHY mode support was compensated by the
EtherAVB MAC driver, which configures TX and/or RX internal delay
itself, based on the PHY mode.
However, now the KSZ9031 driver has gained PHY mode support, delays may
be configured twice, causing regressions. E.g. on the Renesas
Salvator-X board with R-Car M3-W ES1.0, TX performance dropped from ca.
400 Mbps to 0.1-0.3 Mbps, as measured by nuttcp.
As internal delay configuration supported by the KSZ9031 PHY is too
limited for some use cases, the ability to configure MAC internal delay
is deemed useful and necessary. Hence a proper fix would involve
splitting internal delay configuration in two parts, one for the PHY,
and one for the MAC. However, this would require adding new DT
properties, thus breaking DTB backwards-compatibility.
Hence fix the regression in a backwards-compatibility way, by letting
the EtherAVB driver mask the PHY mode when it has inserted a delay, to
avoid the PHY driver adding a second delay. This also fixes messages
like:
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: *-skew-ps values should be used only with phy-mode = "rgmii"
as the PHY no longer sees the original RGMII-*ID mode.
Solving the issue by splitting configuration in two parts can be handled
in future patches, and would require retaining a backwards-compatibility
mode anyway.
Fixes: 4570040da42c24b9 ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
selftests: forwarding: Two small changes
Two unrelated changes in this patchset:
- In patch #1, convert mirror tests from using ping directly to generating
ICMP packets by mausezahn. Using ping in tests is error-prone, because
ping is too smart. On a flaky system (notably in a simulator), when
packets don't come quickly enough, more pings are sent, and that throws
off counters. This was worked around in the past by just pinging more
slowly, but using mausezahn avoids the issue as well without making the
tests unnecessary slow.
- A missing stats_update callback was recently added to act_pedit. Now that
iproute2 supports JSON dumping for pedit, extend in patch #2 the
pedit_dsfield selftest with a check that would have caught the fact that
the callback was missing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 29 May 2020 11:16:54 +0000 (14:16 +0300)]
selftests: forwarding: pedit_dsfield: Check counter value
A missing stats_update callback was recently added to act_pedit. Now that
iproute2 supports JSON dumping for pedit, extend the pedit_dsfield selftest
with a check that would have caught the fact that the callback was missing.
Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Fri, 29 May 2020 11:16:53 +0000 (14:16 +0300)]
selftests: forwarding: mirror_lib: Use mausezahn
Using ping in tests is error-prone, because ping is too smart. On a
flaky system (notably in a simulator), when packets don't come quickly
enough, more pings are sent, and that throws off counters. Instead use
mausezahn to generate ICMP echo request packets. That allows us to
send them in quicker succession as well, because the reason the ping
was made slow in the first place was to make the tests work on
simulated systems.
Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2020 04:47:08 +0000 (21:47 -0700)]
Merge branch 'vxlan-fdb-nexthop-misc-fixes'
Roopa Prabhu says:
====================
vxlan fdb nexthop misc fixes
Roopa Prabhu (2):
vxlan: add check to prevent use of remote ip attributes with NDA_NH_ID
vxlan: few locking fixes in nexthop event handler
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Fri, 29 May 2020 05:12:36 +0000 (22:12 -0700)]
vxlan: few locking fixes in nexthop event handler
- remove fdb from nh_list before the rcu grace period
- protect fdb->vdev with rcu
- hold spin lock before destroying fdb
Fixes: f31cc193aebd ("vxlan: support for nexthop notifiers") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Fri, 29 May 2020 05:12:35 +0000 (22:12 -0700)]
vxlan: add check to prevent use of remote ip attributes with NDA_NH_ID
NDA_NH_ID represents a remote ip or a group of remote ips.
It allows use of nexthop groups in lieu of a remote ip or a
list of remote ips supported by the fdb api.
Current code ignores the other remote ip attrs when NDA_NH_ID is
specified. In the spirit of strict checking, This commit adds a
check to explicitly return an error on incorrect usage.
Fixes: feeface2e67a ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2020 04:44:50 +0000 (21:44 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2020-05-28
This series contains updates to the ice driver only.
Anirudh (Ani) adds a poll for reset completion before proceeding with
driver initialization when the DDP package fails to load and the firmware
issues a core reset.
Jake cleans up unnecessary code, since ice_set_dflt_vsi_ctx() performs a
memset to clear the info from the context structures. Fixed a potential
double free during probe unrolling after a failure. Also fixed a
potential NULL pointer dereference upon register_netdev() failure.
Tony makes two functions static which are not called outside of their
file.
Brett refactors the ice_ena_vf_mappings(), which was doing the VF's MSIx
and queue mapping in one function which was hard to digest. So create a
new function to handle the enabling MSIx mappings and another function
to handle the enabling of queue mappings. Simplify the code flow in
ice_sriov_configure(). Created a helper function for clearing
VPGEN_VFRTRIG register, as this needs to be done on reset to notify the
VF that we are done resetting it. Fixed the initialization/creation and
reset flows, which was unnecessarily complicated, so separate the two
flows into their own functions. Renamed VF initialization functions to
make it more clear what they do and why. Added functionality to set the
VF trust mode bit on reset. Added helper functions to rebuild the VLAN
and MAC configurations when resetting a VF. Refactored how the VF reset
is handled to prevent VF reset timeouts.
Paul cleaned up code not needed during a CORER/GLOBR reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Lew [Thu, 28 May 2020 23:05:26 +0000 (16:05 -0700)]
net: qrtr: Allocate workqueue before kernel_bind
A null pointer dereference in qrtr_ns_data_ready() is seen if a client
opens a qrtr socket before qrtr_ns_init() can bind to the control port.
When the control port is bound, the ENETRESET error will be broadcasted
and clients will close their sockets. This results in DEL_CLIENT
packets being sent to the ns and qrtr_ns_data_ready() being called
without the workqueue being allocated.
Allocate the workqueue before setting sk_data_ready and binding to the
control port. This ensures that the work and workqueue structs are
allocated and initialized before qrtr_ns_data_ready can be called.
Fixes: 5d7341dac688 ("net: qrtr: Migrate nameservice to kernel from userspace") Signed-off-by: Chris Lew <clew@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2020 04:39:13 +0000 (21:39 -0700)]
Merge branch 'mptcp-a-bunch-of-fixes'
Paolo Abeni says:
====================
mptcp: a bunch of fixes
This patch series pulls together a few bugfixes for MPTCP bug observed while
doing stress-test with apache bench - forced to use MPTCP and multiple
subflows.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 29 May 2020 15:43:31 +0000 (17:43 +0200)]
mptcp: remove msk from the token container at destruction time.
Currently we remote the msk from the token container only
via mptcp_close(). The MPTCP master socket can be destroyed
also via other paths (e.g. if not yet accepted, when shutting
down the listener socket). When we hit the latter scenario,
dangling msk references are left into the token container,
leading to memory corruption and/or UaF.
This change addresses the issue by moving the token removal
into the msk destructor.
Fixes: 763873e77a3e ("mptcp: Add key generation and token tree") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The MP_JOIN socket will be leaked. Additionally we can hit
UaF for the msk 'struct socket' referenced via the 'conn'
field.
This change try to address the issue introducing some
synchronization between the MP_JOIN 3whs and mptcp_close
via the join_list spinlock. If we detect the msk is closing
the MP_JOIN socket is closed, too.
Fixes: fb5f5519370c ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 29 May 2020 15:43:29 +0000 (17:43 +0200)]
mptcp: fix unblocking connect()
Currently unblocking connect() on MPTCP sockets fails frequently.
If mptcp_stream_connect() is invoked to complete a previously
attempted unblocking connection, it will still try to create
the first subflow via __mptcp_socket_create(). If the 3whs is
completed and the 'can_ack' flag is already set, the latter
will fail with -EINVAL.
This change addresses the issue checking for pending connect and
delegating the completion to the first subflow. Additionally
do msk addresses and sk_state changes only when needed.
Fixes: a4ba69e25e7d ("mptcp: Associate MPTCP context with TCP socket") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 31 May 2020 01:14:11 +0000 (18:14 -0700)]
Merge tag 'wireless-drivers-next-2020-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.8
Third set of patches for v5.8. Final new features before the merge
window (most likely) opens, noteworthy here is adding WPA3 support to
old drivers rt2800, b43 and b43_legacy.
Major changes:
ath10k
* SDIO and SNOC busses are not experimental anymore
ath9k
* allow receive of broadcast Action frames
ath9k_htc
* allow receive of broadcast Action frames
rt2800
* enable WPA3 support out of box
b43
* enable WPA3 support
b43_legacy
* enable WPA3 support
mwifiex
* advertise max number of clients to user space
mt76
* mt7663: add remain-on-channel support
* mt7915: add spatial reuse support
* add support for mt7611n hardware
iwlwifi
* add ACPI DSM support
* support enabling 5.2GHz bands in Indonesia via ACPI
* bump FW API version to 56
====================
Signed-off-by: David S. Miller <davem@davemloft.net>