Haishuang Yan [Fri, 14 Sep 2018 04:26:47 +0000 (12:26 +0800)]
ip_gre: fix parsing gre header in ipgre_err
gre_parse_header stops parsing when csum_err is encountered, which means
tpi->key is undefined and ip_tunnel_lookup will return NULL improperly.
This patch introduce a NULL pointer as csum_err parameter. Even when
csum_err is encountered, it won't return error and continue parsing gre
header as expected.
Fixes: 724e474d950e ("gre: Remove support for sharing GRE protocol hook.") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PHY_POLL is defined as -1 which means that we would be setting all flags of the
PHY driver, this is also not a valid flag to tell PHYLIB about, just remove it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 16 Sep 2018 22:30:23 +0000 (15:30 -0700)]
Merge branch 'act_police-lockless-data-path'
Davide Caratti says:
====================
net/sched: act_police: lockless data path
the data path of 'police' action can be faster if we avoid using spinlocks:
- patch 1 converts act_police to use per-cpu counters
- patch 2 lets act_police use RCU to access its configuration data.
test procedure (using pktgen from https://github.com/netoptimizer):
# ip link add name eth1 type dummy
# ip link set dev eth1 up
# tc qdisc add dev eth1 clsact
# tc filter add dev eth1 egress matchall action police \
> rate 2gbit burst 100k conform-exceed pass/pass index 100
# for c in 1 2 4; do
> ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1
> done
net/sched: act_police: don't use spinlock in the data path
use RCU instead of spinlocks, to protect concurrent read/write on
act_police configuration. This reduces the effects of contention in the
data path, in case multiple readers are present.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
use per-CPU counters, instead of sharing a single set of stats with all
cores. This removes the need of using spinlock when statistics are read
or updated.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not put host-endian 0 or 1 into big endian feild.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In the quest to remove all stack VLA usage from the kernel[1], this
removes the VLA used for the emac xaht registers size. Since the size
of registers can only ever be 4 or 8, as detected in emac_init_config(),
the max can be hardcoded and a runtime test added for robustness.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Christian Lamparter <chunkeey@gmail.com> Cc: Ivan Mikhaylov <ivan@de.ibm.com> Cc: netdev@vger.kernel.org Co-developed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
gso_segment: Reset skb->mac_len after modifying network header
When splitting a GSO segment that consists of encapsulated packets, the
skb->mac_len of the segments can end up being set wrong, causing packet
drops in particular when using act_mirred and ifb interfaces in
combination with a qdisc that splits GSO packets.
This happens because at the time skb_segment() is called, network_header
will point to the inner header, throwing off the calculation in
skb_reset_mac_len(). The network_header is subsequently adjust by the
outer IP gso_segment handlers, but they don't set the mac_len.
Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6
gso_segment handlers, after they modify the network_header.
Many thanks to Eric Dumazet for his help in identifying the cause of
the bug.
Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Do not fail when IRQ are not initialized
When the Device Tree is not providing the per-port interrupts, do not fail
during b53_srab_irq_enable() but instead bail out gracefully. The SRAB driver
is used on the BCM5301X (Northstar) platforms which do not yet have the SRAB
interrupts wired up.
Fixes: a29ede51ae7d ("net: dsa: b53: Make SRAB driver manage port interrupts") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Sep 2018 16:25:41 +0000 (09:25 -0700)]
Merge branch 'vhost_net-TX-batching'
Jason Wang says:
====================
vhost_net TX batching
This series tries to batch submitting packets to underlayer socket
through msg_control during sendmsg(). This is done by:
1) Doing userspace copy inside vhost_net
2) Build XDP buff
3) Batch at most 64 (VHOST_NET_BATCH) XDP buffs and submit them once
through msg_control during sendmsg().
4) Underlayer sockets can use XDP buffs directly when XDP is enalbed,
or build skb based on XDP buff.
For the packet that can not be built easily with XDP or for the case
that batch submission is hard (e.g sndbuf is limited). We will go for
the previous slow path, passing iov iterator to underlayer socket
through sendmsg() once per packet.
This can help to improve cache utilization and avoid lots of indirect
calls with sendmsg(). It can also co-operate with the batching support
of the underlayer sockets (e.g the case of XDP redirection through
maps).
Testpmd(txonly) in guest shows obvious improvements:
Test /+pps%
XDP_DROP on TAP /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb) /+26%
Netperf TCP_STREAM TX from guest shows obvious improvements on small
packet:
Jason Wang [Wed, 12 Sep 2018 03:17:09 +0000 (11:17 +0800)]
vhost_net: batch submitting XDP buffers to underlayer sockets
This patch implements XDP batching for vhost_net. The idea is first to
try to do userspace copy and build XDP buff directly in vhost. Instead
of submitting the packet immediately, vhost_net will batch them in an
array and submit every 64 (VHOST_NET_BATCH) packets to the under layer
sockets through msg_control of sendmsg().
When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a
loop without caring GUP thus it can do batch map flushing. When XDP is
not enabled or not supported, the underlayer socket need to build skb
and pass it to network core. The batched packet submission allows us
to do batching like netif_receive_skb_list() in the future.
This saves lots of indirect calls for better cache utilization. For
the case that we can't so batching e.g when sndbuf is limited or
packet size is too large, we will go for usual one packet per
sendmsg() way.
Doing testpmd on various setups gives us:
Test /+pps%
XDP_DROP on TAP /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb) /+26%
Netperf tests shows obvious improvements for small packet transmission:
Jason Wang [Wed, 12 Sep 2018 03:17:08 +0000 (11:17 +0800)]
tap: accept an array of XDP buffs through sendmsg()
This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. Tap will build skb through those XDP buffers.
This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 12 Sep 2018 03:17:07 +0000 (11:17 +0800)]
tuntap: accept an array of XDP buffs through sendmsg()
This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. If an XDP program is attached, tuntap can run
XDP program directly. If not, tuntap will build skb and do a fast
receiving since part of the work has been done by vhost_net.
This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 12 Sep 2018 03:17:03 +0000 (11:17 +0800)]
tuntap: tweak on the path of skb XDP case in tun_build_skb()
If we're sure not to go native XDP, there's no need for several things
like bh and rcu stuffs. So this patch introduces a helper to build skb
and hold page refcnt. When we found we will go through skb path, build
skb directly.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 12 Sep 2018 03:17:02 +0000 (11:17 +0800)]
tuntap: simplify error handling in tun_build_skb()
There's no need to duplicate page get logic in each action. So this
patch tries to get page and calculate the offset before processing XDP
actions (except for XDP_DROP), and undo them when meet errors (we
don't care the performance on errors). This will be used for factoring
out XDP logic.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Wed, 12 Sep 2018 03:16:59 +0000 (11:16 +0800)]
net: sock: introduce SOCK_XDP
This patch introduces a new sock flag - SOCK_XDP. This will be used
for notifying the upper layer that XDP program is attached on the
lower socket, and requires for extra headroom.
TUN will be the first user.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andre Naujoks [Mon, 10 Sep 2018 08:27:15 +0000 (10:27 +0200)]
ipv6: Add sockopt IPV6_MULTICAST_ALL analogue to IP_MULTICAST_ALL
The socket option will be enabled by default to ensure current behaviour
is not changed. This is the same for the IPv4 version.
A socket bound to in6addr_any and a specific port will receive all traffic
on that port. Analogue to IP_MULTICAST_ALL, disable this behaviour, if
one or more multicast groups were joined (using said socket) and only
pass on multicast traffic from groups, which were explicitly joined via
this socket.
Without this option disabled a socket (system even) joined to multiple
multicast groups is very hard to get right. Filtering by destination
address has to take place in user space to avoid receiving multicast
traffic from other multicast groups, which might have traffic on the same
port.
The extension of the IP_MULTICAST_ALL socketoption to just apply to ipv6,
too, is not done to avoid changing the behaviour of current applications.
Signed-off-by: Andre Naujoks <nautsch2@gmail.com> Acked-By: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Sep 2018 15:14:34 +0000 (08:14 -0700)]
Merge branch 'Lantiq-Intel-vrx200-support'
Hauke Mehrtens says:
====================
Add support for Lantiq / Intel vrx200 network
This adds basic support for the GSWIP (Gigabit Switch) found in the
VRX200 SoC.
There are different versions of this IP core used in different SoCs, but
this driver was currently only tested on the VRX200 SoC line, for other
SoCs this driver probably need some adoptions to work.
I also plan to add Layer 2 offloading to the DSA driver and later also
layer 3 offloading which is supported by the PPE HW block.
All these patches should go through the net-next tree.
This depends on the patch "MIPS: lantiq: dma: add dev pointer" which
should go into 4.19.
Changes since:
v2:
* Send patch "MIPS: lantiq: dma: add dev pointer" separately
* all: removed return in register write functions
* switch: uses phylink
* switch: uses hardware MDIO auto polling
* switch: use usleep_range() in MDIO busy check
* switch: configure MDIO bus to 2.5 MHz
* switch: disable xMII link when it is not used
* Ethernet: use NAPI for TX cleanups
* Ethernet: enable clock in open callback
* Ethernet: improve skb allocation
* Ethernet: use net_dev->stats
v1:
* Add "MIPS: lantiq: dma: add dev pointer"
* checkpatch fixes a all patches
* Added binding documentation
* use readx_poll_timeout function and ETIMEOUT error code
* integrate GPHY firmware loading into DSA driver
* renamed to NET_DSA_LANTIQ_GSWIP
* removed some needed casts
* added of_device_id.data information about the detected switch
* fixed John's email address
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: Add Lantiq / Intel DSA driver for vrx200
This adds the DSA driver for the GSWIP Switch found in the VRX200 SoC.
This switch is integrated in the DSL SoC, this SoC uses a GSWIP version
2.1, there are other SoCs using different versions of this IP block, but
this driver was only tested with the version found in the VRX200.
Currently only the basic features are implemented which will forward all
packages to the CPU and let the CPU do the forwarding. The hardware also
support Layer 2 offloading which is not yet implemented in this driver.
The GPHY FW loaded is now done by this driver and not any more by the
separate driver in drivers/soc/lantiq/gphy.c, I will remove this driver
is a separate patch. to make use of the GPHY this switch driver is
needed anyway. Other SoCs have more embedded GPHYs so this driver should
support a variable number of GPHYs. After the firmware was loaded the
GPHY can be probed on the MDIO bus and it behaves like an external GPHY,
without the firmware it can not be probed on the MDIO bus.
The clock names in the sysctrl.c file have to be changed because the
clocks are now used by a different driver. This should be cleaned up and
a real common clock driver should provide the clocks instead.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
This drives the PMAC between the GSWIP Switch and the CPU in the VRX200
SoC. This is currently only the very basic version of the Ethernet
driver.
When the DMA channel is activated we receive some packets which were
send to the SoC while it was still in U-Boot, these packets have the
wrong header. Resetting the IP cores did not work so we read out the
extra packets at the beginning and discard them.
This also adapts the clock code in sysctrl.c to use the default name of
the device node so that the driver gets the correct clock. sysctrl.c
should be replaced with a proper common clock driver later.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
This handles the tag added by the PMAC on the VRX200 SoC line.
The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.
This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When a DMA channel is opened the IRQ should not get activated
automatically, this allows it to pull data out manually without the help
of interrupts. This is needed for a workaround in the vrx200 Ethernet
driver.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Paul Burton <paul.burton@mips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Wed, 12 Sep 2018 02:04:21 +0000 (10:04 +0800)]
geneve: add ttl inherit support
Similar with commit 3d9bdb2c9adca ("vxlan: add ttl inherit support"),
currently ttl == 0 means "use whatever default value" on geneve instead
of inherit inner ttl. To respect compatibility with old behavior, let's
add a new IFLA_GENEVE_TTL_INHERIT for geneve ttl inherit support.
Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm
Pull drm nouveau fixes from Dave Airlie:
"I'm sending this separately as it's a bit larger than I generally like
for one driver, but it does contain a bunch of make my nvidia laptop
not die (runpm) and a bunch to make my docking station and monitor
display stuff (mst) fixes.
Lyude has spent a lot of time on these, and we are putting the fixes
into distro kernels as well asap, as it helps a bunch of standard
Lenovo laptops, so I'm fairly happy things are better than they were
before these patches, but I decided to split them out just for
clarification"
* tag 'drm-fixes-2018-09-12' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/disp/gm200-: enforce identity-mapped SOR assignment for LVDS/eDP panels
drm/nouveau/disp: fix DP disable race
drm/nouveau/disp: move eDP panel power handling
drm/nouveau/disp: remove unused struct member
drm/nouveau/TBDdevinit: don't fail when PMU/PRE_OS is missing from VBIOS
drm/nouveau/mmu: don't attempt to dereference vmm without valid instance pointer
drm/nouveau: fix oops in client init failure path
drm/nouveau: Fix nouveau_connector_ddc_detect()
drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload
drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early
drm/nouveau: Reset MST branching unit before enabling
drm/nouveau: Only write DP_MSTM_CTRL when needed
drm/nouveau: Remove useless poll_enable() call in drm_load()
drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state()
drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state()
drm/nouveau: Fix deadlocks in nouveau_connector_detect()
drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect()
drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests
drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend()
drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement
Marek Vasut [Tue, 11 Sep 2018 22:15:24 +0000 (00:15 +0200)]
net: dsa: mv88e6xxx: Make sure to configure ports with external PHYs
The MV88E6xxx can have external PHYs attached to certain ports and those
PHYs could even be on different MDIO bus than the one within the switch.
This patch makes sure that ports with such PHYs are configured correctly
according to the information provided by the PHY.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Remove set but not used variables 'fw_mbx' and 'hdr_size'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c: In function 'qlcnic_sriov_pull_bc_msg':
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c:907:6: warning:
variable 'fw_mbx' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c: In function 'qlcnic_sriov_issue_bc_post':
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c:939:16: warning:
variable 'hdr_size' set but not used [-Wunused-but-set-variable]
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix up several Kconfig dependencies in netfilter, from Martin Willi
and Florian Westphal.
2) Memory leak in be2net driver, from Petr Oros.
3) Memory leak in E-Switch handling of mlx5 driver, from Raed Salem.
4) mlx5_attach_interface needs to check for errors, from Huy Nguyen.
5) tipc_release() needs to orphan the sock, from Cong Wang.
6) Need to program TxConfig register after TX/RX is enabled in r8169
driver, not beforehand, from Maciej S. Szmigiero.
7) Handle 64K PAGE_SIZE properly in ena driver, from Netanel Belgazal.
8) Fix crash regression in ip_do_fragment(), from Taehee Yoo.
9) syzbot can create conditions where kernel log is flooded with
synflood warnings due to creation of many listening sockets, fix
that. From Willem de Bruijn.
10) Fix RCU issues in rds socket layer, from Cong Wang.
11) Fix vlan matching in nfp driver, from Pieter Jansen van Vuuren.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
nfp: flower: fix vlan match by checking both vlan id and vlan pcp
tipc: check return value of __tipc_dump_start()
s390/qeth: don't dump past end of unknown HW header
s390/qeth: use vzalloc for QUERY OAT buffer
s390/qeth: switch on SG by default for IQD devices
s390/qeth: indicate error when netdev allocation fails
rds: fix two RCU related problems
r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
erspan: fix error handling for erspan tunnel
erspan: return PACKET_REJECT when the appropriate tunnel is not found
tcp: rate limit synflood warnings further
MIPS: lantiq: dma: add dev pointer
netfilter: xt_hashlimit: use s->file instead of s->private
netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT
netfilter: cttimeout: ctnl_timeout_find_get() returns incorrect pointer to type
netfilter: conntrack: timeout interface depend on CONFIG_NF_CONNTRACK_TIMEOUT
netfilter: conntrack: reset tcp maxwin on re-register
qmi_wwan: Support dynamic config on Quectel EP06
ethernet: renesas: convert to SPDX identifiers
...
Add support for entries which are "sticky", i.e. will not change their port
if they show up from a different one. A new ndm flag is introduced for that
purpose - NTF_STICKY. We allow to set it only to non-local entries.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 13 Sep 2018 03:24:22 +0000 (20:24 -0700)]
Merge branch 'Preparing-for-phylib-linkmodes'
Andrew Lunn says:
====================
Preparing for phylib linkmodes
phylib currently makes us of a u32 bitmap for advertising, supported,
and link partner capabilities. For a long time, this has been
sufficient, for devices up to 1Gbps. With more MAC/PHY combinations
now supporting speeds greater than 1Gbps, we have run out of
bits. There is the need to replace this u32 with an
__ETHTOOL_DECLARE_LINK_MODE_MASK, which makes use of linux's generic
bitmaps.
This patchset does some of the work preparing for this change. A few
cleanups are applied to PHY drivers. Some MAC drivers directly access
members of phydev which are going to change type. These patches adds
some helpers and swaps MAC drivers to use them, mostly dealing with
Pause configuration.
v3: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Add missing at in commit message
Change Subject of patch 5
Fix return in from phy_set_asym_pause
Fix kerneldoc in phy_set_pause
v2:
Fixup bad indentation in tg3.c
Rename phy_support_pause() to phy_support_sym_pause()
Also trigger autoneg if the advertising settings have changed.
Rename phy_set_pause() to phy_set_sym_pause()
Use the bcm63xx_enet.c logic, not fec_main.c for validating pause
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 11 Sep 2018 23:53:14 +0000 (01:53 +0200)]
net: ethernet: Add helper to remove a supported link mode
Some MAC hardware cannot support a subset of link modes. e.g. often
1Gbps Full duplex is supported, but Half duplex is not. Add a helper
to remove such a link mode.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 11 Sep 2018 23:53:13 +0000 (01:53 +0200)]
net: ethernet: Fix up drivers masking pause support
PHY drivers don't indicate they support pause. They expect MAC drivers
to enable its support if the MAC has the needed hardware. Thus MAC
drivers should not mask Pause support, but enable it.
Change a few ANDs to ORs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 11 Sep 2018 23:53:12 +0000 (01:53 +0200)]
net: bcmgenet: Fix speed selection for reverse MII
The phy supported speed is being used to determine if the MAC should
be configured to 100 or 1G. The masking logic is broken. Instead, look
at 1G supported speeds to enable 1G MAC support.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 11 Sep 2018 23:53:11 +0000 (01:53 +0200)]
net: ethernet: Use phy_set_max_speed() to limit advertised speed
Many Ethernet MAC drivers want to limit the PHY to only advertise a
maximum speed of 100Mbs or 1Gbps. Rather than using a mask, make use
of the helper function phy_set_max_speed().
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The driver indicates it can do 10/100 full and half duplex, plus 1G
Full. The datasheet indicates 1G half is also supported. So make use
of the standard PHY_GBIT_FEATURES.
It could be, this was added because there is a MAC which does not
support 1G half. Bit this is the wrong place to enforce this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 11 Sep 2018 23:53:08 +0000 (01:53 +0200)]
net: phy: ste10Xp: Remove wrong SUPPORTED_Pause
The PHY driver should not indicate that Pause is supported. It is upto
the MAC drive enable it, if it supports Pause frames. So remove it
from the ste10Xp driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 11 Sep 2018 13:44:08 +0000 (06:44 -0700)]
nfp: report FW vNIC stats in interface stats
Report in standard netdev stats drops and errors as well as
RX multicast from the FW vNIC counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Sep 2018 20:18:30 +0000 (13:18 -0700)]
Merge branch 'nfp-flower-fixes'
Jakub Kicinski says:
====================
nfp: flower: fixes for flower offload
Two fixes for flower matching and tunnel encap. Pieter fixes
VLAN matching if the entire VLAN id is masked out and match
is only performed on the PCP field. Louis adds validation of
tunnel flags for encap, most importantly we should not offload
actions on IPv6 tunnels if it's not supported.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Louis Peens [Tue, 11 Sep 2018 13:38:45 +0000 (06:38 -0700)]
nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
This fixes a bug where ipv6 tunnels would report that it is
getting offloaded to hardware but would actually be rejected
by hardware.
Fixes: 81c5129e8d7f ("nfp: compile flower vxlan tunnel set actions") Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: flower: fix vlan match by checking both vlan id and vlan pcp
Previously we only checked if the vlan id field is present when trying
to match a vlan tag. The vlan id and vlan pcp field should be treated
independently.
Fixes: a4d202833224 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 11 Sep 2018 22:12:17 +0000 (15:12 -0700)]
tipc: check return value of __tipc_dump_start()
When __tipc_dump_start() fails with running out of memory,
we have no reason to continue, especially we should avoid
calling tipc_dump_done().
Fixes: 00c46f928cd1 ("tipc: call start and done ops directly in __tipc_nl_compat_dumpit()") Reported-and-tested-by: syzbot+3f8324abccfbf8c74a9f@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Sep 2018 20:12:52 +0000 (13:12 -0700)]
Merge branch 'qeth-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2018-09-12
please apply the following qeth fixes for -net.
Patch 1 resolves a regression in an error path, while patch 2 enables
the SG support by default that was newly introduced with 4.19.
Patch 3 takes care of a longstanding problem with large-order
allocations, and patch 4 fixes a potential out-of-bounds access.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: don't dump past end of unknown HW header
For inbound data with an unsupported HW header format, only dump the
actual HW header. We have no idea how much payload follows it, and what
it contains. Worst case, we dump past the end of the Inbound Buffer and
access whatever is located next in memory.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_query_oat_command() currently allocates the kernel buffer for
the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with
fragmented memory, large allocations may fail (eg. the qethqoat tool by
default uses 132KB).
Solve this issue by using vzalloc, backing the allocation with
non-contiguous memory.
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: indicate error when netdev allocation fails
Bailing out on allocation error is nice, but we also need to tell the
ccwgroup core that creating the qeth groupdev failed.
Fixes: 2b7a295719fd ("s390/qeth: allocate netdevice early") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'riscv-for-linus-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fix from Palmer Dabbelt:
"This contains what I hope to be the last RISC-V patch for 4.19.
It fixes a bug in our initramfs support by removing some broken and
obselete code"
* tag 'riscv-for-linus-4.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
riscv: Do not overwrite initrd_start and initrd_end
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers (qedi and iscsi target) so no wider impact
even if the code changes are a bit extensive"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Add the CRC size within iSCSI NVM image
scsi: iscsi: target: Fix conn_ops double free
scsi: iscsi: target: Set conn->sess to NULL when iscsi_login_set_conn_values fails
Cong Wang [Tue, 11 Sep 2018 01:27:26 +0000 (18:27 -0700)]
rds: fix two RCU related problems
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.
Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.
The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.
Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: rds-devel@oss.oracle.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Tue, 11 Sep 2018 01:05:01 +0000 (09:05 +0800)]
netlink: remove hash::nelems check in netlink_insert
The type of hash::nelems has been changed from size_t to atom_t
which in fact is int, so not need to check if BITS_PER_LONG, that
is bit number of size_t, is bigger than 32
and rht_grow_above_max() will be called to check if hashtable is
too big, ensure it can not bigger than 1<<31
Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 11 Sep 2018 00:21:42 +0000 (17:21 -0700)]
scsi: libcxgbi: fib6_ino reference in rt6_info is rcu protected
The fib6_info reference in rt6_info is rcu protected. Add a helper
to extract prefsrc from and update cxgbi_check_route6 to use it.
Fixes: a4abe4b0e6e5 ("net/ipv6: Remove rt6i_prefsrc") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED
After system suspend, sometimes the r8169 doesn't work when ethernet
cable gets pluggued.
This issue happens because rtl_reset_work() doesn't get called from
rtl8169_runtime_resume(), after system suspend.
In rtl_task(), RTL_FLAG_TASK_* only gets cleared if this condition is
met:
if (!netif_running(dev) ||
!test_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags))
...
If RTL_FLAG_TASK_ENABLED was cleared during system suspend while
RTL_FLAG_TASK_RESET_PENDING was set, the next rtl_schedule_task() won't
schedule task as the flag is still there.
So in addition to clearing RTL_FLAG_TASK_ENABLED, also clears other
flags.
Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/tls: Fixed return value when tls_complete_pending_work() fails
In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has
been set to return value of tls_complete_pending_work(). This allows
return of proper error code if tls_complete_pending_work() fails.
Fixes: 3a71956a2d9a ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Mon, 10 Sep 2018 14:19:48 +0000 (22:19 +0800)]
erspan: fix error handling for erspan tunnel
When processing icmp unreachable message for erspan tunnel, tunnel id
should be erspan_net_id instead of ipgre_net_id.
Fixes: 47b2af5234e3 ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Mon, 10 Sep 2018 14:19:47 +0000 (22:19 +0800)]
erspan: return PACKET_REJECT when the appropriate tunnel is not found
If erspan tunnel hasn't been established, we'd better send icmp port
unreachable message after receive erspan packets.
Fixes: 47b2af5234e3 ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: aquantia: implement WOL and EEE support
This is v3 of WOL/EEE functionality patch for atlantic driver.
In this patchset Yana Esina and Nikita Danilov implemented:
- Upload function to interact with FW memory
- Definitions and structures necessary for the correct operation of Wake ON Lan
- The functionality Wake On Lan via ethtool (Magic packet is supported)
- The functionality for Energy-Efficient Ethernet configuration via ethtool
Version 3:
- use ETH_ALEN instead of raw number
Version 2 has the following fixes:
- patchset reorganized to extract renaming and whitespace fixes into separate
patches
- some of magic numbers replaced with defines
- reverse christmas tree applied
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed extra characters from the names of structures to unify prefixes
used through the driver code (we normally use hw_atl for hw specifics).
HW_ATL_B0_ and HW_ATL_A0_ are the same and useless copies.
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yana Esina [Mon, 10 Sep 2018 09:39:31 +0000 (12:39 +0300)]
net: aquantia: implement EEE support
Support of Energy-Efficient Ethernet to aQuantia NIC's via ethtool
(according to the IEEE 802.3az specifications)
Signed-off-by: Yana Esina <yana.esina@aquantia.com> Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Yana Esina [Mon, 10 Sep 2018 09:39:30 +0000 (12:39 +0300)]
net: aquantia: implement WOL support
Add WOL support. Currently only magic packet
(ethtool -s <ethX> wol g) feature is implemented.
Remove hw_set_power and move that to FW_OPS set_power:
because WOL configuration behaves differently on 1x and 2x
firmwares
Signed-off-by: Yana Esina <yana.esina@aquantia.com> Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Yana Esina [Mon, 10 Sep 2018 09:39:29 +0000 (12:39 +0300)]
net: aquantia: definitions for WOL
Added definitions and structures needed to support WOL.
Signed-off-by: Yana Esina <yana.esina@aquantia.com> Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Yana Esina [Mon, 10 Sep 2018 09:39:28 +0000 (12:39 +0300)]
net: aquantia: fix hw_atl_utils_fw_upload_dwords
This patch fixes the upload function, which worked incorrectly with
some chips.
Signed-off-by: Yana Esina <yana.esina@aquantia.com> Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With the changes in patch 1 and 2, droq lock is not required.
So removing droq lock.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Removed oom task unconditional rescheduling every 250ms and created per
queue oom work queue for refilling buffers.
The oom task refills only if the available descriptors is fallen to 64.
There will be no packets coming in after hitting this level. So NAPI will
not run until oom task refills the buffers.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Control packets are processed in tasklet when interface is down and in
NAPI when interface is up. So tasklet can be disabled when interface up
and re-enabled when interface is down.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Convert pr_info to net_info_ratelimited to limit the total number of
synflood warnings.
Commit 05dd46389fde ("tcp: Change possible SYN flooding messages")
rate limits synflood warnings to one per listener.
Workloads that open many listener sockets can still see a high rate of
log messages. Syzkaller is one frequent example.
Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xen-netback: remove unecessary condition check before debugfs_remove_recursive
debugfs_remove_recursive has taken IS_ERR_OR_NULL into account. So just
remove the condition check before debugfs_remove_recursive.
Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: xenbus: remove redundant condition check before debugfs_remove_recursive
debugfs_remove_recursive has taken the IS_ERR_OR_NULL into account. Just
remove the unnecessary condition check.
Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Sat, 8 Sep 2018 08:39:25 +0000 (11:39 +0300)]
net: dsa: b53: Uninitialized variable in b53_adjust_link()
The "pause" variable is only initialized on BCM5301x.
Fixes: 3b2398ae0e99 ("net: dsa: b53: Add helper to set link parameters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for you net tree:
1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing.
2) Restore conntrack dependency on xt_cluster, from Martin Willi.
3) Fix splat with GSO skbs from the checksum target, from Florian Westphal.
4) Rework ct timeout support, the template strategy to attach custom timeouts
is not correct since it will not work in conjunction with conntrack zones
and we have a possible free after use when removing the rule due to missing
refcounting. To fix these problems, do not use conntrack template at all
and set custom timeout on the already valid conntrack object. This
fix comes with a preparation patch to simplify timeout adjustment by
initializating the first position of the timeout array for all of the
existing trackers. Patchset from Florian Westphal.
5) Fix missing dependency on from IPv4 chain NAT type, from Florian.
6) Release chain reference counter from the flush path, from Taehee Yoo.
7) After flushing an iptables ruleset, conntrack hooks are unregistered
and entries are left stale to be cleaned up by the timeout garbage
collector. No TCP tracking is done on established flows by this time.
If ruleset is reloaded, then hooks are registered again and TCP
tracking is restored, which considers packets to be invalid. Clear
window tracking to exercise TCP flow pickup from the middle given that
history is lost for us. Again from Florian.
8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y
and CONFIG_NF_CT_NETLINK_TIMEOUT=n.
9) Broken CT target due to returning incorrect type from
ctnl_timeout_find_get().
10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner.
11) Missing conversion of hashlimit sysctl interface to new API, from
Cong Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>