Changes since v5
# Removed helper macros
# Removed 'inline' qualifier
# Changed multiline empty comment to single line
# Added 'clock-names' property in DT binding example
# Ignore 'clock-names' property in driver until f/ws in the wild are
upgraded or we support instance that take in more than one clock.
# Rebased the patchset onto net-next
Changes since v4
# Fixed ucode indexing as a word, instead of byte
# Removed redundant clocks, keep only phy rate reference clock
and expect it to be 'phy_ref_clk'
Changes since v3
# Discard 'socionext,snq-mdio', and simply use 'mdio' subnode.
# Use ioremap on ucode region as well, instead of memremap.
Changes since v2
# Use 'mdio' subnode in DT bindings.
# Use phy_interface_mode_is_rgmii(), instead of open coding the check.
# Use readl/b with eeprom_base pointer.
# Unregister mdio bus upon failure in probe.
Changes since v1
# Switched from using memremap to ioremap
# Implemented ndo_do_ioctl callback
# Defined optional 'dma-coherent' DT property
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jassi Brar [Sat, 6 Jan 2018 14:14:56 +0000 (19:44 +0530)]
MAINTAINERS: Add entry for Socionext ethernet driver
Add entry for the Socionext Netsec controller driver and DT bindings.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jassi Brar [Sat, 6 Jan 2018 14:14:37 +0000 (19:44 +0530)]
net: socionext: Add Synquacer NetSec driver
This driver adds support for Socionext "netsec" IP Gigabit
Ethernet + PHY IP used in the Synquacer SC2A11 SoC.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jassi Brar [Sat, 6 Jan 2018 14:14:19 +0000 (19:44 +0530)]
dt-bindings: net: Add DT bindings for Socionext Netsec
This patch adds documentation for Device-Tree bindings for the
Socionext NetSec Controller driver.
Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Jan 2018 19:38:06 +0000 (14:38 -0500)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-09
This series contains updates to ixgbe and ixgbevf only.
Emil fixes an issue with "wake on LAN"(WoL) where we need to ensure we
enable the reception of multicast packets so that WoL works for IPv6
magic packets. Cleaned up code no longer needed with the update to
adaptive ITR.
Paul update the driver to advertise the highest capable link speed
when a module gets inserted. Also extended the displaying of firmware
version to include the iSCSI and OEM block in the EEPROM to better
identify firmware versions/images.
Tonghao Zhang cleans up a code comment that no longer applies since
InterruptThrottleRate has been removed from the driver.
Alex fixes SR-IOV and MACVLAN offload interaction, where the MACVLAN
offload was incorrectly configuring several filters with the wrong
pool value which resulted in MACLVAN interfaces not being able to
receive traffic that had to pass over the physical interface. Fixed
transmit hangs and dropped receive frames when the number of VFs
changed. Added support for RSS on MACVLAN pools for X550 devices.
Fixed up the MACVLAN limitations so we can now support 63 offloaded
devices. Cleaned up MACVLAN code that is no longer needed with the
recent changes and fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jan 2018 17:38:57 +0000 (12:38 -0500)]
Merge branch 'r8169-improve-runtime-pm'
Heiner Kallweit says:
====================
r8169: improve runtime pm
On my system with two network ports I found that runtime PM didn't
suspend the unused port. Therefore I checked runtime pm in this driver
in somewhat more detail and this series improves runtime pm in general
and solves the mentioned issue.
Tested on a system with RTL8168evl (MAC version 34).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:13 +0000 (21:39 +0100)]
r8169: improve runtime pm in general and suspend unused ports
So far rpm doesn't cover cases like unused ports which are never
brought up. If they are active at probe time they remain in this state.
Included in this patch:
- Let the idle notification check whether we can suspend and let it
schedule the suspend. This way we don't need to have calls to
pm_schedule_suspend in different places.
- At the end of rtl_open and rtl_init_one send an idle notification
to allow suspending if the link is down. If a cable is plugged in
aneg is finished before the suspend timer expires and the suspend
request is cancelled.
- Change rtl8169_runtime_suspend to power down the chip if the
interface is down.
Successfully tested on a RTL8168evl (mac version 34).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:07 +0000 (21:39 +0100)]
r8169: improve runtime pm in rtl8169_check_link_status
This patch partially reverts commit 74fe67fdfdd3 "r8169: Fix runtime
power management" from 2010. At that time the suspend delay was 100ms
and therefore suspending happened during initial aneg. Currently
suspend delay is 5s, so suspend starts after aneg and the issue
doesn't exist any longer. On my system aneg takes almost 3s, to be on
the safe side let's increase the suspend delay to 10s.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:02 +0000 (21:39 +0100)]
r8169: remove unneeded rpm ops in rtl_shutdown
This patch reverts commit c45858303193 "r8169: runtime resume before
shutdown" from 2012. Few months after this change the underlying issue
was solved in the PCI core with commit 88b3b7a69d5c "PCI/PM: Resume
device before shutdown".
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tipc: improvements to group messaging
We make a number of simplifications and improvements to the group
messaging service. They aim at readability/maintainability of the code
as well as scalability.
The series is based on commit efb7a104081e ("tipc: fix problems with
multipoint-to-point flow control) which has been applied to 'net' but
not yet to 'net-next'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:31 +0000 (21:03 +0100)]
tipc: improve poll() for group member socket
The current criteria for returning POLLOUT from a group member socket is
too simplistic. It basically returns POLLOUT as soon as the group has
external destinations, something obviously leading to a lot of spinning
during destination congestion situations. At the same time, the internal
congestion handling is unnecessarily complex.
We now change this as follows.
- We introduce an 'open' flag in struct tipc_group. This flag is used
only to help poll() get the setting of POLLOUT right, and *not* for
congeston handling as such. This means that a user can choose to
ignore an EAGAIN for a destination and go on sending messages to
other destinations in the group if he wants to.
- The flag is set to false every time we return EAGAIN on a send call.
- The flag is set to true every time any member, i.e., not necessarily
the member that caused EAGAIN, is removed from the small_win list.
- We remove the group member 'usr_pending' flag. The size of the send
window and presence in the 'small_win' list is sufficient criteria
for recognizing congestion.
This solution seems to be a reasonable compromise between 'anycast',
which is normally not waiting for POLLOUT for a specific destination,
and the other three send modes, which are.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:30 +0000 (21:03 +0100)]
tipc: improve groupcast scope handling
When a member joins a group, it also indicates a binding scope. This
makes it possible to create both node local groups, invisible to other
nodes, as well as cluster global groups, visible everywhere.
In order to avoid that different members end up having permanently
differing views of group size and memberhip, we must inhibit locally
and globally bound members from joining the same group.
We do this by using the binding scope as an additional separator between
groups. I.e., a member must ignore all membership events from sockets
using a different scope than itself, and all lookups for message
destinations must require an exact match between the message's lookup
scope and the potential target's binding scope.
Apart from making it possible to create local groups using the same
identity on different nodes, a side effect of this is that it now also
becomes possible to create a cluster global group with the same identity
across the same nodes, without interfering with the local groups.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:29 +0000 (21:03 +0100)]
tipc: add option to suppress PUBLISH events for pre-existing publications
Currently, when a user is subscribing for binding table publications,
he will receive a PUBLISH event for all already existing matching items
in the binding table.
However, a group socket making a subscriptions doesn't need this initial
status update from the binding table, because it has already scanned it
during the join operation. Worse, the multiplicatory effect of issuing
mutual events for dozens or hundreds group members within a short time
frame put a heavy load on the topology server, with the end result that
scale out operations on a big group tend to take much longer than needed.
We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
subscriptions, so that this initial avalanche of events is suppressed.
This change, along with the previous commit, significantly improves the
range and speed of group scale out operations.
We keep the new option internal for the tipc driver, at least for now.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:28 +0000 (21:03 +0100)]
tipc: send out join messages as soon as new member is discovered
When a socket is joining a group, we look up in the binding table to
find if there are already other members of the group present. This is
used for being able to return EAGAIN instead of EHOSTUNREACH if the
user proceeds directly to a send attempt.
However, the information in the binding table can be used to directly
set the created member in state MBR_PUBLISHED and send a JOIN message
to the peer, instead of waiting for a topology PUBLISH event to do this.
When there are many members in a group, the propagation time for such
events can be significant, and we can save time during the join
operation if we use the initial lookup result fully.
In this commit, we eliminate the member state MBR_DISCOVERED which has
been the result of the initial lookup, and do instead go directly to
MBR_PUBLISHED, which initiates the setup.
After this change, the tipc_member FSM looks as follows:
Jon Maloy [Mon, 8 Jan 2018 20:03:27 +0000 (21:03 +0100)]
tipc: simplify group LEAVE sequence
After the changes in the previous commit the group LEAVE sequence
can be simplified.
We now let the arrival of a LEAVE message unconditionally issue a group
DOWN event to the user. When a topology WITHDRAW event is received, the
member, if it still there, is set to state LEAVING, but we only issue a
group DOWN event when the link to the peer node is gone, so that no
LEAVE message is to be expected.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:26 +0000 (21:03 +0100)]
tipc: create group member event messages when they are needed
In the current implementation, a group socket receiving topology
events about other members just converts the topology event message
into a group event message and stores it until it reaches the right
state to issue it to the user. This complicates the code unnecessarily,
and becomes impractical when we in the coming commits will need to
create and issue membership events independently.
In this commit, we change this so that we just notice the type and
origin of the incoming topology event, and then drop the buffer. Only
when it is time to actually send a group event to the user do we
explicitly create a new message and send it upwards.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:24 +0000 (21:03 +0100)]
tipc: let group member stay in JOINED mode if unable to reclaim
We handle a corner case in the function tipc_group_update_rcv_win().
During extreme pessure it might happen that a message receiver has all
its active senders in RECLAIMING or REMITTED mode, meaning that there
is nobody to reclaim advertisements from if an additional sender tries
to go active.
Currently we just set the new sender to ACTIVE anyway, hence at least
theoretically opening up for a receiver queue overflow by exceeding the
MAX_ACTIVE limit. The correct solution to this is to instead add the
member to the pending queue, while letting the oldest member in that
queue revert to JOINED state.
In this commit we refactor the code for handling message arrival from
a JOINED member, both to make it more comprehensible and to cover the
case described above.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset by Jenny adds sanity checks in ethtool ringparam
operation for input upper bounds, similarly to what's done in
ethtool_set_channels.
The checks are added in patch 1, using a call to get_ringparam
prior to calling set_ringparam NDO.
Patch 2 changes the function's behavior in mlx4_en, so that
it returns an error for out-of-range input, instead of rounding
it to closest valid, similar to mlx5e.
Patch 3 removes the upper bound checks in mlx5e_ethtool_set_ringparam
as it becomes redundant.
Series generated against net-next commit: d4b5920ac300 Merge branch 'ipv6-ipv4-nexthop-align'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx5e: Remove redundant checks in set_ringparam
Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_en: Align behavior of set ring size flow via ethtool
In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool: Ensure new ring parameters are within bounds during SRINGPARAM
Add a sanity check to ensure that all requested ring parameters
are within bounds, which should reduce errors in driver implementation.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 22 Nov 2017 18:56:52 +0000 (10:56 -0800)]
ixgbe: Drop l2_accel_priv data pointer from ring struct
The l2 acceleration private pointer isn't needed in the ring struct. It
isn't really used anywhere other than to test and see if we are supporting
an offloaded macvlan netdev, and it is much easier to test netdev for not
being ixgbe based to verify that.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:46 +0000 (10:56 -0800)]
ixgbe: Use ring values to test for Tx pending
This patch simplifies the check for Tx pending traffic and makes it more
holistic as there being any difference between next_to_use and
next_to_clean is much more informative than if head and tail are equal, as
it is possible for us to either not update tail, or not be notified of
completed work in which case next_to_clean would not be equal to head.
In addition the simplification makes it so that we don't have to read
hardware which allows us to drop a number of variables that were previously
being used in the call.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:40 +0000 (10:56 -0800)]
ixgbe: Fix limitations on macvlan so we can support up to 63 offloaded devices
This change is a fix of the macvlan offload so that we correctly handle
macvlan offloaded devices. Specifically we were configuring our limits based
on the assumption that we were going to max out the RSS indices for every
mode. As a result when we went to 15 or more macvlan interfaces we were
forced into the 2 queue RSS mode on VFs even though they could have still
supported 4.
This change splits the logic up so that we limit either the total number of
macvlan instances if DCB is enabled, or limit the number of RSS queues used
per macvlan (instead of per pool) if SR-IOV is enabled. By doing this we
can make best use of the part.
In addition I have increased the maximum number of supported interfaces to
63 with one queue per offloaded interface as this more closely reflects the
actual values supported by the interface.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:34 +0000 (10:56 -0800)]
ixgbe: There is no need to update num_rx_pools in L2 fwd offload
The num_rx_pools value is overwritten when we reinitialize the queue
configuration. In reality we shouldn't need to be updating the value since
it is redone every time we call into ixgbe_setup_tc so for now just drop
the spots where we were incrementing or decrementing the value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:28 +0000 (10:56 -0800)]
ixgbe: Add support for macvlan offload RSS on X550 and clean-up pool handling
In order for RSS to work on the macvlan pools of the X550 we need to
populate the MRQC, RETA, and RSS key values for each pool. This patch makes
it so that we now take care of that.
In addition I have dropped the macvlan specific configuration of psrtype
since it is redundant with the code that already exists for configuring
this value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:22 +0000 (10:56 -0800)]
ixgbe: Perform reinit any time number of VFs change
If the number of VFs are changed we need to reinitialize the part since the
offset for the device and the number of pools will be incorrect. Without
this change we can end up seeing Tx hangs and dropped Rx frames for
incoming traffic.
In addition we should drop the code that is arbitrarily changing the
default pool and queue configuration. Instead we should wait until the port
is reset and reconfigured via ixgbe_sriov_reinit.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 22 Nov 2017 18:56:16 +0000 (10:56 -0800)]
ixgbe: Fix interaction between SR-IOV and macvlan offload
When SR-IOV was enabled the macvlan offload was configuring several filters
with the wrong pool value. This would result in the macvlan interfaces not
being able to receive traffic that had to pass over the physical interface.
To fix it wrap the pool argument in the VMDQ_P macro which will add the
necessary offset to get to the actual VMDq pool
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 21 Nov 2017 23:58:27 +0000 (15:58 -0800)]
ixgbevf: remove redundant setting of xcast_mode
Removed leftover assignment of xcast_mode.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jason Baron [Fri, 5 Jan 2018 22:44:54 +0000 (17:44 -0500)]
virtio_net: propagate linkspeed/duplex settings from the hypervisor
The ability to set speed and duplex for virtio_net is useful in various
scenarios as described here:
f932819 virtio_net: add ethtool support for set and get of settings
However, it would be nice to be able to set this from the hypervisor,
such that virtio_net doesn't require custom guest ethtool commands.
Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.
Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
is that device feature bits are to grow down from bit 63, since the
transports are starting from bit 24 and growing up.
Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Greenwalt [Fri, 27 Oct 2017 14:32:40 +0000 (10:32 -0400)]
ixgbe: extend firmware version support
Extend FW version reporting by displaying information from the iSCSI
or OEM block in the EEPROM.
This will allow us to more accurately identify the FW.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Felix Walter [Fri, 5 Jan 2018 13:33:31 +0000 (14:33 +0100)]
macsec: Add support for GCM-AES-256 cipher suite
This adds support for the GCM-AES-256 cipher suite as specified in
IEEE 802.1AEbn-2011. The prepared cipher suite selection mechanism is used,
with GCM-AES-128 being the default cipher suite as defined in the standard.
Signed-off-by: Felix Walter <felix.walter@cloudandheat.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Greenwalt [Tue, 24 Oct 2017 15:00:40 +0000 (11:00 -0400)]
ixgbe: advertise highest capable link speed
On module insert advertise highest capable link speed. If module is
capable of 10G, then advertise 10G, else advertise modules capable
link speeds.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Mon, 16 Oct 2017 23:55:18 +0000 (16:55 -0700)]
ixgbe: remove unused enum latency_range
This enum is no longer needed after
commit: 7e7371d006e ("ixgbe: Update adaptive ITR algorithm")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 10 Oct 2017 20:20:01 +0000 (13:20 -0700)]
ixgbe: enable multicast on shutdown for WOL
Previously we only enabled the reception of multicast packets when
wake on multicast is set, but we also need this to allow waking with
IPv6 magic packets.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 9 Jan 2018 15:57:19 +0000 (10:57 -0500)]
Merge branch 'XDP-transmission-for-tuntap'
Jason Wang says:
====================
XDP transmission for tuntap
This series tries to implement XDP transmission (ndo_xdp_xmit) for
tuntap. Pointer ring was used for queuing both XDP buffers and
sk_buff, this is done by encoding the type into lowest bit of the
pointer and storin XDP metadata in the headroom of XDP buff.
Tests gets 3.05 Mpps when doing xdp_redirect_map from ixgbe to VM
(testpmd + virtio-net in guest). This gives us ~20% improvments
compared to use skb during redirect.
Please review.
Changes from V1:
- slient warnings
- fix typos
- add skb mode number in the commit log
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 4 Jan 2018 03:14:28 +0000 (11:14 +0800)]
tuntap: XDP transmission
This patch implements XDP transmission for TAP. Since we can't create
new queues for TAP during XDP set, exist ptr_ring was reused for
queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
(0x1UL) was encoded into lowest bit of xpd_buff pointer during
ptr_ring_produce, and was decoded during consuming. XDP metadata was
stored in the headroom of the packet which should work in most of
cases since driver usually reserve enough headroom. Very minor changes
were done for vhost_net: it just need to peek the length depends on
the type of pointer.
Tests were done on two Intel E5-2630 2.40GHz machines connected back
to back through two 82599ES. Traffic were generated/received through
MoonGen/testpmd(rxonly). It reports ~20% improvements when
xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
to 3.05Mpps)
Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
Alexander Duyck.
2) Undo unintentional UAPI change in netfilter conntrack, from Florian
Westphal.
3) Revert a change to how error codes are returned from
dev_get_valid_name(), it broke some apps.
4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
dual-stack. From Eli Cooper.
5) Fix missed PMTU updates in geneve, from Xin Long.
6) Cure double free in macvlan, from Gao Feng.
7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
Mohamed Ghannam.
8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
deferral of probe when the regulator is not ready yet).
9) Missing DMA mapping error checks in 3c59x, from Neil Horman.
10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.
11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
isn't initialized. From Andrei Vagin.
12) Fix crashes in fib6_add(), from Wei Wang.
13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
sh_eth: fix TXALCR1 offsets
mdio-sun4i: Fix a memory leak
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
sctp: fix the handling of ICMP Frag Needed for too small MTUs
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
xen-netfront: enable device after manual module load
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
sh_eth: fix SH7757 GEther initialization
net: fec: free/restore resource in related probe error pathes
uapi/if_ether.h: prevent redefinition of struct ethhdr
ipv6: fix general protection fault in fib6_add()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
net: stmmac: enable EEE in MII, GMII or RGMII only
rtnetlink: give a user socket to get_target_net()
MAINTAINERS: Update my email address.
can: ems_usb: improve error reporting for error warning and error passive
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
can: gs_usb: fix return value of the "set_bittiming" callback
...
Yang Shi [Mon, 8 Jan 2018 19:52:54 +0000 (03:52 +0800)]
net: tipc: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by TIPC at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 8 Jan 2018 19:52:53 +0000 (03:52 +0800)]
net: ovs: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by openvswitch at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: dev@openvswitch.org Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 8 Jan 2018 19:52:52 +0000 (03:52 +0800)]
net: caif: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by caif at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:40 +0000 (12:08 +0200)]
8139cp: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:39 +0000 (12:08 +0200)]
bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:38 +0000 (12:08 +0200)]
e1000: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:35 +0000 (12:08 +0200)]
net: Fix netdev_WARN_ONCE macro
netdev_WARN_ONCE is broken (whoops..), this fix will remove the
unnecessary "condition" parameter, add the missing comma and change
"arg" to "args".
Fixes: 3ca9b6737651 ("net: Introduce netdev_*_once functions") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your
net-next tree:
1) Free hooks via call_rcu to speed up netns release path, from
Florian Westphal.
2) Reduce memory footprint of hook arrays, skip allocation if family is
not present - useful in case decnet support is not compiled built-in.
Patches from Florian Westphal.
3) Remove defensive check for malformed IPv4 - including ihl field - and
IPv6 headers in x_tables and nf_tables.
4) Add generic flow table offload infrastructure for nf_tables, this
includes the netlink control plane and support for IPv4, IPv6 and
mixed IPv4/IPv6 dataplanes. This comes with NAT support too. This
patchset adds the IPS_OFFLOAD conntrack status bit to indicate that
this flow has been offloaded.
5) Add secpath matching support for nf_tables, from Florian.
6) Save some code bytes in the fast path for the nf_tables netdev,
bridge and inet families.
7) Allow one single NAT hook per point and do not allow to register NAT
hooks in nf_tables before the conntrack hook, patches from Florian.
8) Seven patches to remove the struct nf_af_info abstraction, instead
we perform direct calls for IPv4 which is faster. IPv6 indirections
are still needed to avoid dependencies with the 'ipv6' module, but
these now reside in struct nf_ipv6_ops.
9) Seven patches to handle NFPROTO_INET from the Netfilter core,
hence we can remove specific code in nf_tables to handle this
pseudofamily.
10) No need for synchronize_net() call for nf_queue after conversion
to hook arrays. Also from Florian.
11) Call cond_resched_rcu() when dumping large sets in ipset to avoid
softlockup. Again from Florian.
12) Pass lockdep_nfnl_is_held() to rcu_dereference_protected(), patch
from Florian Westphal.
13) Fix matching of counters in ipset, from Jozsef Kadlecsik.
14) Missing nfnl lock protection in the ip_set_net_exit path, also
from Jozsef.
15) Move connlimit code that we can reuse from nf_tables into
nf_conncount, from Florian Westhal.
And asorted cleanups:
16) Get rid of nft_dereference(), it only has one single caller.
17) Add nft_set_is_anonymous() helper function.
18) Remove NF_ARP_FORWARD leftover chain definition in nf_tables_arp.
19) Remove unnecessary comments in nf_conntrack_h323_asn1.c
From Varsha Rao.
20) Remove useless parameters in frag_safe_skb_hp(), from Gao Feng.
21) Constify layer 4 conntrack protocol definitions, function
parameters to register/unregister these protocol trackers, and
timeouts. Patches from Florian Westphal.
22) Remove nlattr_size indirection, from Florian Westphal.
23) Add fall-through comments as -Wimplicit-fallthrough needs this,
from Gustavo A. R. Silva.
24) Use swap() macro to exchange values in ipset, patch from
Gustavo A. R. Silva.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 9 Jan 2018 00:17:31 +0000 (16:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
- One line fix to mlx4 error flow (same as mlx5 fix in last pull
request, just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that
they don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/srpt: Fix ACL lookup during login
IB/srpt: Disable RDMA access by the initiator
RDMA/netlink: Fix locking around __ib_get_device_by_index
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Yafang Shao [Sun, 7 Jan 2018 06:31:47 +0000 (14:31 +0800)]
net: tracepoint: exposing sk_faimily in tracepoint inet_sock_set_state
As of now, there're two sk_family are traced with sock:inet_sock_set_state,
which are AF_INET and AF_INET6.
So the sk_family are exposed as well.
Then we can conveniently use it to do the filter.
Both sk_family and sk_protocol are showed in the printk message, so we need
not expose them as tracepoint arguments.
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Suggested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 21:26:47 +0000 (00:26 +0300)]
sh_eth: fix TXALCR1 offsets
The TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error. Luckily, the driver never uses this
register. :-)
Fixes: 1745b8bdaf8f ("net: sh_eth: modify the definitions of register") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.
Fixes: 3415c8cb2449 ("net: Add MDIO bus driver for the Allwinner EMAC") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 5 Jan 2018 18:47:14 +0000 (19:47 +0100)]
l2tp: adjust comments about L2TPv3 offsets
The "offset" option has been removed by
commit 2de8d28f35cc ("l2tp: remove configurable payload offset").
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1463447 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:19:13 +0000 (14:19 -0500)]
Merge branch 'SCTP-PMTU-discovery-fixes'
Marcelo Ricardo Leitner says:
====================
SCTP PMTU discovery fixes
This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.
Please consider these to stable.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 5 Jan 2018 16:07:10 +0000 (16:07 +0000)]
net: phy: fix wrong masks to phy_modify()
The mask argument for phy_modify() in several locations was inverted.
Fixes: 9799ed22fc90 ("net: phy: convert read-modify-write to phy_modify()") Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: fix the handling of ICMP Frag Needed for too small MTUs
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit afc378a99524 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.
The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.
Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Otubo [Fri, 5 Jan 2018 08:42:16 +0000 (09:42 +0100)]
xen-netfront: enable device after manual module load
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.
The guest would face errors like:
[root@guest ~]# ethtool -i eth0
Cannot get driver information: No such device
[root@guest ~]# ifconfig eth0
eth0: error fetching interface information: Device not found
This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.
Signed-off-by: Eduardo Otubo <otubo@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Fri, 5 Jan 2018 04:06:39 +0000 (23:06 -0500)]
forcedeth: remove duplicate structure member in rx
Since both first_rx and rx_ring are the head of rx ring, it not
necessary to use two structure members to statically indicate
the head of rx ring. So first_rx is removed.
CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:13:45 +0000 (14:13 -0500)]
Merge branch 'bnxt_en_fixes'
Michael Chan says:
====================
bnxt_en: 2 small bug fixes.
The first one fixes the TC Flower flow parameter passed to firmware. The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: dfe047769e71 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Challa [Thu, 4 Jan 2018 23:46:54 +0000 (18:46 -0500)]
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.
Fixes: 06410ad6118c ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 8 Jan 2018 19:13:08 +0000 (11:13 -0800)]
Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This contains fixes for the following two non-trivial issues:
- The task iterator got broken while adding thread mode support for
v4.14. It was less visible because it only triggers when both
cgroup1 and cgroup2 hierarchies are in use. The recent versions of
systemd uses cgroup2 for process management even when cgroup1 is
used for resource control exposing this issue.
- cpuset CPU hotplug path could deadlock when racing against exits.
There also are two patches to replace unlimited strcpy() usages with
strlcpy()"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
cgroup: Fix deadlock in cpu hotplug path
cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
cgroup: avoid copying strings longer than the buffers
Stefano Brivio [Thu, 4 Jan 2018 23:38:05 +0000 (00:38 +0100)]
tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertions
The two conditions triggering BUG_ON() are somewhat unrelated:
the tcp_skb_pcount() check is meant to catch TSO flaws, the
second one checks sanity of congestion window bookkeeping.
Split them into two separate BUG_ON() assertions on two lines,
so that we know which one actually triggers, when they do.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 4 Jan 2018 22:03:54 +0000 (14:03 -0800)]
net: ipv6: Allow connect to linklocal address from socket bound to vrf
Allow a process bound to a VRF to connect to a linklocal address.
Currently, this fails because of a mismatch between the scope of the
linklocal address and the sk_bound_dev_if inherited by the VRF binding:
$ ssh -6 fe80::70b8:cff:fedd:ead8%eth1
ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument
Relax the scope check to allow the socket to be bound to the same L3
device as the scope id.
This makes ipv6 linklocal consistent with other relaxed checks enabled
by commits a5b9ecfc76df ("net: l3mdev: Allow send on enslaved interface")
and 3a3177977e932 ("net: Allow IP_MULTICAST_IF to set index to L3 slave").
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Thu, 4 Jan 2018 21:26:46 +0000 (00:26 +0300)]
sh_eth: remove sh_eth_plat_data::edmac_endian
Since the commit e7b492a5d18 ("sh_eth: remove EDMAC_BIG_ENDIAN") (geez,
I didn't realize that was 2 years ago!) the initializers in the SuperH
platform code for the 'sh_eth_plat_data::edmac_endian' stopped to matter,
so we can remove that field for good (not sure if it was ever useful --
SH7786 Ether has been reported to have the same EDMAC descriptor/register
endiannes as configured for the SuperH CPU)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:06:20 +0000 (14:06 -0500)]
Merge branch 'hns3-next'
Peng Li says:
====================
add some new features and fix some bugs for HNS3 driver
This patchset adds some new features support and fixes some bugs:
[Patch 1/20] adds support to enable/disable vlan filter with ethtool
[Patch 2/20] disables VFs change rxvlan offload status
[Patch 3/20 - 13/120 fix bugs and refine some codes for packet
statistics, support query with both ifconfig and ethtool.
[Patch 14/20 - 20/20] fix some other bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:22 +0000 (18:18 +0800)]
net: hns3: fix for not setting pause parameters
Pause parameters include source address, transmit gap and pause time.
The default value of the pause source address is zero in the hardware.
Default pause parameters need to be set to the hardware. Also, when
setting new mac address, the pause source address need to be updated.
Fixes: eda20620a1da ("net: hns3: Add support for PFC setting in TM module") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:21 +0000 (18:18 +0800)]
net: hns3: add MTU initialization for hardware
When initializing the MAC, the MTU vlaue need to be set to the hardware
too. Otherwise, the MTU value of software will be different from the MTU
value of hardware.
Fixes: 22c0e2ce21dd ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:20 +0000 (18:18 +0800)]
net: hns3: fix for changing MTU
when changing MTU, The new MTU must need to be set to netdevice.
Fixes: e2e33f0baec0 ("net: hns3: Add support to change MTU in HNS3 hardware") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:19 +0000 (18:18 +0800)]
net: hns3: fix for setting MTU
When setting MTU, actually what we do is configuring the max frame size
for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be
considered. And the frame size which is less than the default value
should not be set to the hardware. Because in the hardware, the the max
frame size not only controls the RX packet size, but also controls the
TX packet size. the RX packets whose size are greater than the setting
value will be dropped.
This patch fixes the bug setting a error max frame size to hardware.
Fixes: 22c0e2ce21dd ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:18 +0000 (18:18 +0800)]
net: hns3: fix for updating fc_mode_last_time
commit a9c782822166 ("net: hns3: add support for set_pauseparam")
adds set_pauseparam support for ethtool cmd, but forgets to update
fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam().
The wrong fc_mode_last_time will be used to update flow control mode
when lldpad has been running. As a result, when using the ethtool
command "-a", user will get a wrong pause parameter.
This patch adds the fc_mode_last_time update when PFC mode is disabled.
Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:14 +0000 (18:18 +0800)]
net: hns3: Fix an error macro definition of HNS3_TQP_STAT
The member "stats_offset" was designed to indicate the offset
of each member of struct ring_stats in struct hns3_enet_ring,
but forgot to add the offset of the member in struct ring_stats.
Fixes: 190aa2d4be5 ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:13 +0000 (18:18 +0800)]
net: hns3: Fix a loop index error of tqp statistics query
An error loop index was used while querying statistics data
of tqps, which may cause call trace.
Fixes: 190aa2d4be50 ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:12 +0000 (18:18 +0800)]
net: hns3: Fix an error of total drop packet statistics
The dropped tx/rx packets number of each tqp should also
be counted into the total drop tx/rx packets numbers.
Fixes: 6e2406330b2 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:11 +0000 (18:18 +0800)]
net: hns3: Mask the packet statistics query when NIC is down
Update the HNS3_NIC_STATE_DOWN bit when NIC state changes.
When NIC is down, mask the packet statistics for querying
with ifconfig command. It's a common practice.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:10 +0000 (18:18 +0800)]
net: hns3: Modify the update period of packet statistics
It takes more than 200 query response messages between
driver and IMP, while updating the packet statistics.
It's too heavy for IMP to update it per second.
Extend the update period of packet statistics data from
1 second to 300 seconds(if too long, the statistics may
overflow).
As a result, we need to update it while querying with
ifconfig tool to keep the statistics data fresh.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:07 +0000 (18:18 +0800)]
net: hns3: Unify the strings display of packet statistics
Some members of packet statistics are named in different styles.
This patch unifies them with new internal name rules, the main
modification are below:
trans --> tx
rcv --> rx
rcb_q%d_tx --> txq#%d
rcb_q%d_rx --> rxq#%d
sw_err_cnt(tx side) --> tx_dropped
sw_err_cnt(rx side) --> rx_dropped
pkts --> packets
tx_err_cnt --> errors
rx_err_cnt --> errors
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:06 +0000 (18:18 +0800)]
net: hns3: Disable VFs change rxvlan offload status
Rxvlan offload status can only be changed by PF. Initialize
the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for
VFS to false, make sure user can't be able to change it.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series introduces the MAPv4 packet format for checksum
offload plus some other minor changes.
Patches 1-3 are cleanups.
Patch 4 renames the ingress format to data format so that all data
formats can be configured using this going forward.
Patch 5 uses the pacing helper to improve TCP transmit performance.
Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX.
A new header and trailer format are used as part of MAPv4.
For RX checksum offload, only the 1's complement of the IP payload
portion is computed by hardware. The meta data from RX header is
used to verify the checksum field in the packet. Note that the
IP packet and its field itself is not modified by hardware.
This gives metadata to help with the RX checksum. For TX, the
required metadata is filled up so hardware can compute the
checksum.
Patch 10 enables GSO on rmnet devices
v1->v2: Fix sparse errors reported by kbuild test robot
v2->v3: Update the commit message for Patch 5 based on Eric's comments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Real devices may support scatter gather(SG), so enable SG on rmnet
devices to use GSO. GSO reduces CPU cycles by 20% for a rate of
146Mpbs for a single stream TCP connection.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>