Vincent Bernat [Sat, 16 Sep 2017 14:18:33 +0000 (16:18 +0200)]
bridge: also trigger RTM_NEWLINK when interface is released from bridge
Currently, when an interface is released from a bridge via
ioctl(), we get a RTM_DELLINK event through netlink:
Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
link/ether 6e:23:c2:54:3a:b3
Userspace has to interpret that as a removal from the bridge, not as a
complete removal of the interface. When an bridged interface is
completely removed, we get two events:
Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
link/ether 6e:23:c2:54:3a:b3
Deleted 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 6e:23:c2:54:3a:b3 brd ff:ff:ff:ff:ff:ff
In constrast, when an interface is released from a bond, we get a
RTM_NEWLINK with only the new characteristics (no master):
3: dummy1: <BROADCAST,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default
link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
3: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
3: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN group default
link/ether ca:c8:7b:66:f8:25 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
Userland may be confused by the fact we say a link is deleted while
its characteristics are only modified. A first solution would have
been to turn the RTM_DELLINK event in del_nbp() into a RTM_NEWLINK
event. However, maybe some piece of userland is relying on this
RTM_DELLINK to detect when a bridged interface is released. Instead,
we also emit a RTM_NEWLINK event once the interface is
released (without master info).
Deleted 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
link/ether 8a:bb:e7:94:b1:f8
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 8a:bb:e7:94:b1:f8 brd ff:ff:ff:ff:ff:ff
This is done only when using ioctl(). When using Netlink, such an
event is already automatically emitted in do_setlink().
Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Use ipv6_authlen for len in ipv6_skip_exthdr
In ipv6_skip_exthdr, the lengh of AH header is computed manually
as (hp->hdrlen+2)<<2. However, in include/linux/ipv6.h, a macro
named ipv6_authlen is already defined for exactly the same job. This
commit replaces the manual computation code with the macro.
Signed-off-by: Xiang Gao <qasdfgtyuiop@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: speedup netns create/delete time
When rate of netns creation/deletion is high enough,
we observe softlockups in cleanup_net() caused by huge list
of netns and way too many rcu_barrier() calls.
This patch series does some optimizations in kobject,
and add batching to tunnels so that netns dismantles are
less costly.
IPv6 addrlabels also get a per netns list, and tcp_metrics
also benefit from batch flushing.
This gives me one order of magnitude gain.
(~50 ms -> ~5 ms for one netns create/delete pair)
Tested:
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo
Eric Dumazet [Tue, 19 Sep 2017 23:27:06 +0000 (16:27 -0700)]
ipv6: addrlabel: per netns list
Having a global list of labels do not scale to thousands of
netns in the cloud era. This causes quadratic behavior on
netns creation and deletion.
This is time having a per netns list of ~10 labels.
Tested:
$ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]
real 0m20.837s # instead of 0m24.227s
user 0m0.328s
sys 0m20.338s # instead of 0m23.753s
16.17% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
12.30% ip [kernel.kallsyms] [k] netlink_has_listeners
6.76% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.78% ip [kernel.kallsyms] [k] memset_erms
5.77% ip [kernel.kallsyms] [k] kobject_uevent_env
5.18% ip [kernel.kallsyms] [k] refcount_sub_and_test
4.96% ip [kernel.kallsyms] [k] _raw_read_lock
3.82% ip [kernel.kallsyms] [k] refcount_inc_not_zero
3.33% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.11% ip [kernel.kallsyms] [k] unmap_page_range
1.77% ip [kernel.kallsyms] [k] __wake_up
1.69% ip [kernel.kallsyms] [k] strlen
1.17% ip [kernel.kallsyms] [k] __wake_up_common
1.09% ip [kernel.kallsyms] [k] insert_header
1.04% ip [kernel.kallsyms] [k] page_remove_rmap
1.01% ip [kernel.kallsyms] [k] consume_skb
0.98% ip [kernel.kallsyms] [k] netlink_trim
0.51% ip [kernel.kallsyms] [k] kernfs_link_sibling
0.51% ip [kernel.kallsyms] [k] filemap_map_pages
0.46% ip [kernel.kallsyms] [k] memcpy_erms
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 Sep 2017 23:27:05 +0000 (16:27 -0700)]
kobject: factorize skb setup in kobject_uevent_net_broadcast()
We can build one skb and let it be cloned in netlink.
This is much faster, and use less memory (all clones will
share the same skb->head)
Tested:
time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]
real 0m24.227s # instead of 0m52.554s
user 0m0.329s
sys 0m23.753s # instead of 0m51.375s
14.77% ip [kernel.kallsyms] [k] __ip6addrlbl_add
14.56% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
11.65% ip [kernel.kallsyms] [k] netlink_has_listeners
6.19% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.66% ip [kernel.kallsyms] [k] kobject_uevent_env
4.97% ip [kernel.kallsyms] [k] memset_erms
4.67% ip [kernel.kallsyms] [k] refcount_sub_and_test
4.41% ip [kernel.kallsyms] [k] _raw_read_lock
3.59% ip [kernel.kallsyms] [k] refcount_inc_not_zero
3.13% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.55% ip [kernel.kallsyms] [k] __wake_up
1.20% ip [kernel.kallsyms] [k] strlen
1.03% ip [kernel.kallsyms] [k] __wake_up_common
0.93% ip [kernel.kallsyms] [k] consume_skb
0.92% ip [kernel.kallsyms] [k] netlink_trim
0.87% ip [kernel.kallsyms] [k] insert_header
0.63% ip [kernel.kallsyms] [k] unmap_page_range
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 Sep 2017 23:27:04 +0000 (16:27 -0700)]
kobject: copy env blob in one go
No need to iterate over strings, just copy in one efficient memcpy() call.
Tested:
time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]
real 0m52.554s # instead of 1m7.492s
user 0m0.309s
sys 0m51.375s # instead of 1m6.875s
9.88% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
8.86% ip [kernel.kallsyms] [k] string
7.37% ip [kernel.kallsyms] [k] __ip6addrlbl_add
5.68% ip [kernel.kallsyms] [k] netlink_has_listeners
5.52% ip [kernel.kallsyms] [k] memcpy_erms
4.76% ip [kernel.kallsyms] [k] __alloc_skb
4.54% ip [kernel.kallsyms] [k] vsnprintf
3.94% ip [kernel.kallsyms] [k] format_decode
3.80% ip [kernel.kallsyms] [k] kmem_cache_alloc_node_trace
3.71% ip [kernel.kallsyms] [k] kmem_cache_alloc_node
3.66% ip [kernel.kallsyms] [k] kobject_uevent_env
3.38% ip [kernel.kallsyms] [k] strlen
2.65% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
2.20% ip [kernel.kallsyms] [k] kfree
2.09% ip [kernel.kallsyms] [k] memset_erms
2.07% ip [kernel.kallsyms] [k] ___cache_free
1.95% ip [kernel.kallsyms] [k] kmem_cache_free
1.91% ip [kernel.kallsyms] [k] _raw_read_lock
1.45% ip [kernel.kallsyms] [k] ksize
1.25% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.00% ip [kernel.kallsyms] [k] widen_string
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 19 Sep 2017 20:15:42 +0000 (13:15 -0700)]
net_sched: no need to free qdisc in RCU callback
gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller no longer needs to wait for a grace period. So this
patch gets rid of it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jim Hanko [Tue, 19 Sep 2017 18:33:39 +0000 (11:33 -0700)]
team: fall back to hash if table entry is empty
If the hash to port mapping table does not have a valid port (i.e. when
a port goes down), fall back to the simple hashing mechanism to avoid
dropping packets.
Signed-off-by: Jim Hanko <hanko@drivescale.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test case for the rhlist interface.
While at it, cleanup current rhashtable test a bit and add a check
for max_size support.
No changes since v1, except in last patch.
kbuild robot complained about large onstack allocation caused by
struct rhltable when lockdep is enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_rhashtable: add test case for rhl_table interface
also test rhltable. rhltable remove operations are slow as
deletions require a list walk, thus test with 1/16th of the given
entry count number to get a run duration similar to rhashtable one.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
test_rhashtable: don't use global entries variable
pass the entries to test as an argument instead.
Followup patch will add an rhlist test case; rhlist delete opererations
are slow so we need to use a smaller number to test it.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Export b53_{enable,disable}_port and use these two functions in
bcm_sf2_port_setup and bcm_sf2_port_disable. The generic functions
cannot be used without wrapping because we need to manage additional
switch integration details (PHY, Broadcom tag etc.).
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: bcm_sf2: Use SF2_NUM_EGRESS_QUEUES for CFP
The magic number 8 in 3 locations in bcm_sf2_cfp.c actually designates
the number of switch port egress queues, so use that define instead of
open-coding it.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bcm_sf2 and b53 do exactly the same thing, so share that piece.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for enabling and disabling EEE, as well as re-negotiating it in
.adjust_link() and in .port_enable().
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Move the bcm_sf2 EEE-related functions to the b53 driver because this is shared
code amongst Gigabit capable switch, only 5325 and 5365 are too old to support
that.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for migrating the EEE code from bcm_sf2 to b53, define the full
EEE register page and offsets within that page.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The code to enable Broadcom tags/headers is largely switch independent,
and in preparation for enabling it for multiple devices with b53, move
the code we have in bcm_sf2.c to b53_common.c
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Use a macro to define I/O operations
Instead of repeating the same pattern: acquire mutex, read/write,
release mutex, define a macro: b53_build_op() which takes the type
(read|write), I/O size, and value (scalar or pointer). This helps with
fixing bugs that could exist (e.g: missing barrier, lock etc.).
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: bcm_sf2: Defer port enabling to calling port_enable
There is no need to configure the enabled ports once in bcm_sf2_sw_setup() and
then a second time around when dsa_switch_ops::port_enable is called, just do
it when port_enable is called which is better in terms of power consumption and
correctness.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Defer port enabling to calling port_enable
There is no need to configure the enabled ports once in b53_setup() and then a
second time around when dsa_switch_ops::port_enable is called, just do it when
port_enable is called which is better in terms of power consumption and
correctness.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Make b53_enable_cpu_port() take a port argument
In preparation for future changes allowing the configuring of multiple
CPU ports, make b53_enable_cpu_port() take a port argument.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA core overrides the master device's ethtool_ops structure so that
it can inject statistics and such of its dedicated switch CPU port.
This ethtool code is currently called on unnecessary conditions or
before the master interface and its switch CPU port get wired up.
This patchset fixes this.
Similarly to slave.c where the DSA slave net_device is the entry point
of the dsa_slave_* functions, this patchset also isolates the master's
ethtool code in a new master.c file, where the DSA master net_device is
the entry point of the dsa_master_* functions.
This is a first step towards better control of the master device and
support for multiple CPU ports.
====================
Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Sep 2017 15:57:00 +0000 (11:57 -0400)]
net: dsa: move master ethtool code
DSA overrides the master device ethtool ops, so that it can inject stats
from its dedicated switch CPU port as well.
The related code is currently split in dsa.c and slave.c, but it only
scopes the master net device. Move it to a new master.c DSA core file.
This file will be later extented with master net device specific code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Sep 2017 15:56:59 +0000 (11:56 -0400)]
net: dsa: setup master ethtool after dsa_ptr
DSA overrides the master's ethtool ops so that we can inject its CPU
port's statistics. Because of that, we need to setup the ethtool ops
after the master's dsa_ptr pointer has been assigned, not before.
This patch setups the ethtool ops after dsa_ptr is assigned, and
restores them before it gets cleared.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Sep 2017 15:56:58 +0000 (11:56 -0400)]
net: dsa: setup master ethtool unconditionally
When a DSA switch tree is meant to be applied, it already has a CPU
port. Thus remove the condition of dst->cpu_dp.
Moreover, the next lines access dst->cpu_dp unconditionally.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Tue, 19 Sep 2017 15:56:57 +0000 (11:56 -0400)]
net: dsa: remove copy of master ethtool_ops
There is no need to store a copy of the master ethtool ops, storing the
original pointer in DSA and the new one in the master netdev itself is
enough.
In the meantime, set orig_ethtool_ops to NULL when restoring the master
ethtool ops and check the presence of the master original ethtool ops as
well as its needed functions before calling them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 Sep 2017 12:14:24 +0000 (05:14 -0700)]
net: sk_buff rbnode reorg
skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp, while skb->dev is either NULL (TCP) or a constant for a given
queue (netem).
Since we plan using an RB tree for TCP retransmit queue to speedup SACK
processing with large BDP, this patch exchanges skb->dev and
skb->tstamp.
This saves some overhead in both TCP and netem.
v2: removes the swtstamp field from struct tcp_skb_cb
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Prepare for multicast router offload
Yotam says:
This patch-set makes various preparations needed for the multicast router
offloading, which include:
- Add the needed registers.
- Add needed ACL actions.
- Add new traps and trap groups.
- Exporting needed private structs and enums.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Add multicast router traps and trap groups
Add three new traps needed for multicast routing:
- PIM: Trap for PIM protocol control packets.
- RPF: Trap for packets that fail the RPF check on a specific hardware
route entry.
- MULTICAST: Generic trap for multicast. It is used for routes that trap
the packets to the CPU.
The RPF and MULTICAST traps have rate limiters as these traps may have
line-rate of packets trapped. The PIM trap has a rate limiter similarly to
other L3 control protocols. The rate limiters are implemented by adding
three new trap groups for the newly introduced traps.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Export RIF dev access function
The mlxsw_sp_rif struct, defined as private struct in spectrum_router.c
will be used in the multicast router source file. Due to the fact that the
dev field will be needed by the multicast router logic, add an access
function to it.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Configure RIF to forward IPv4 multicast packets by default
Turn on two bits on the Spectrum RIF configuration:
- IPv4 multicast: when a multicast packet arrives on a RIF, send it to go
through multicast routes lookup.
- IPv4 multicast forwarding enable: when multicast packet arrives on a
RIF, allow it to be forwarded by multicast routes. If this bit is not
set, multicast packets will go through multicast routing lookup but will
be dropped at the egress of the ports.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The RRCR register is used for copying and moving TCAM multicast routes
from different offsets. It will be used to allow routes relocation for
parman ops as part of the multicast router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: resources: Add multicast ERIF list entries resource
The multicast ERIF list entries resource indicates the number of entries
that can be put in one rigr2 register operation. While the register can
hold up to MLXSW_REG_RIGR2_MAX_ERIFS ( = 32) ERIF entries, the actual
number allowed by firmware is indicated with this resource.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Add the Router Interface Group Version 2 register
The RIGR-V2 register is used to add, remove and query egress interface list
of a multicast forwarding entry and it will be used by the multicast
router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Rename the flexible action set length field
The MLXSW_REG_PXXX_FLEX_ACTION_SET_LEN is relevant for the multicast router
registers too, so rename it to have a general name which is not bound to a
specific register.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: acl: Change trap ACL action to get the trap_id as a parameter
Allow the trap ACL action to be configured with different traps. This
allows the multicast router offloading code to use that same ACL action
with the multicast router traps. By using different traps, the multicast
router can have different trap policies and can handle the packet
differently.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Move ACL flexible actions instance to spectrum
A flexible action instance allows, given a set of ops, creating, committing
and sharing a set of ACL action blocks. The flexible action instance in
question is using the spectrum KVD linear space to store the flexible
action sets.
Move this flexible action instance to the common spectrum struct to allow
other users (such as multicast router) to get that functionality.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The multicast router offloading code is going to require the counter_pools
initialization to occur before the router initialization, thus, change the
spectrum initialization order to fix it.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the 'random' operation tests to include a delete operation
(delete half of the nodes from both lpm implementions and ensure
that lookups are still equivalent).
Also, add a simple IPv4 test which verifies lookup behavior as nodes
are deleted from the tree.
Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
bpf: Add uniqueness invariant to trivial lpm test implementation
The 'trivial' lpm implementation in this test allows equivalent nodes
to be added (that is, nodes consisting of the same prefix and prefix
length). For lookup operations, this is fine because insertion happens
at the head of the (singly linked) list and the first, best match is
returned. In order to support deletion, the tlpm data structue must
first enforce uniqueness. This change modifies the insertion algorithm
to search for equivalent nodes and remove them. Note: the
BPF_MAP_TYPE_LPM_TRIE already has a uniqueness invariant that is
implemented as node replacement.
Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE
This is a simple non-recursive delete operation. It prunes paths
of empty nodes in the tree, but it does not try to further compress
the tree as nodes are removed.
Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 18 Sep 2017 19:36:22 +0000 (12:36 -0700)]
net_sched: sch_htb: add per class overlimits counter
HTB qdisc overlimits counter is properly increased, but we have no per
class counter, meaning it is difficult to diagnose HTB problems.
This patch adds this counter, visible in "tc -s class show dev eth0",
with current iproute2.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 18 Sep 2017 11:40:38 +0000 (12:40 +0100)]
net_sched: use explicit size of struct tcmsg, remove need to declare tcm
Pointer tcm is being initialized and is never read, it is only being used
to determine the size of struct tcmsg. Clean this up by removing
variable tcm and explicitly using the sizeof struct tcmsg rather than *tcm.
Cleans up clang warning:
warning: Value stored to 'tcm' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
korina: performance fixes and cleanup
Changes from v1:
- use GRO instead of increasing ring size
- use NAPI_POLL_WEIGHT instead of defining own NAPI_WEIGHT
- optimize rx descriptor flags processing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Yeryomin [Sun, 17 Sep 2017 17:24:15 +0000 (20:24 +0300)]
net: korina: don't use overflow and underflow interrupts
When such interrupts occur there is not much we can do.
Dropping the whole ring doesn't help and only produces high packet loss.
If we just ignore the interrupt the mac will drop one or few packets instead of the whole ring.
Also this will lower the irq handling load and increase performance.
Signed-off-by: Roman Yeryomin <roman@advem.lv> Signed-off-by: David S. Miller <davem@davemloft.net>
Modify baycom driver to use the new parallel port device model.
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Acked-By: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
first batch of patches for 4.15. One larger item in there is Hans'
addition of new configuration options for flexible packet processing
('VNIC characteristics'). The patch descriptions have all the details.
Please apply.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: fold VLAN handling into l3_rebuild_skb()
Move the overly complicated VLAN processing from the L3 RX handler into
its l3_rebuild_skb() helper. No change in functionality.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Properly return any error encountered during VLAN processing to the
the caller.
Resulting change in behaviour: if SETVLAN fails while registering a
new VLAN ID, the stack no longer creates the corresponding vlan device.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use the right helpers to create/remove all attribute groups in one go.
Suggested-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: don't take queue lock in send_packet_fast()
Locking the output queue prior to TX is needed on OSA devices,
to synchronize against a packing flush from the TX completion code
(via qeth_check_outbound_queue()).
But send_packet_fast() is only used for IQDs, which don't do packing.
So remove the locking, and apply some easy cleanups.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: remove unused code in qdio_establish_cq()
Storing the number of input buffers into 'i' has no effect, it is
immediately re-assigned in the next line.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hans Wippel [Mon, 18 Sep 2017 19:18:16 +0000 (21:18 +0200)]
s390/qeth: add VNICC get/set timeout support
HiperSockets allow configuring so called VNIC Characteristics (VNICC)
that influence how the underlying hardware handles packets. For VNICCs,
additional commands for getting and setting timeouts are available.
Currently, the learning VNICC uses these commands.
* Learning VNICC: If learning is enabled on a qeth device, the device
learns the source MAC addresses of outgoing packets and incoming
packets to those learned MAC addresses are received.
For learning, the timeout specifies the idle period in seconds, after
which the underlying hardware removes a learned MAC address again.
This patch adds support for the IPA commands that are required to get
and set the current timeout values for the learning VNIC characteristic.
Also, it introduces the sysfs interface that allows users to configure
the timeout.
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hans Wippel [Mon, 18 Sep 2017 19:18:15 +0000 (21:18 +0200)]
s390/qeth: add VNICC enable/disable support
HiperSocket devices allow enabling and disabling so called VNIC
Characteristics (VNICC) that influence how the underlying hardware
handles packets. These VNICCs are:
* Flooding VNICC: Flooding allows specifying if packets to unknown
destination MAC addresses are received by the qeth device.
* Multicast flooding VNICC: Multicast flooding allows specifying if
packets to multicast MAC addresses are received by the qeth device.
* Learning VNICC: If learning is enabled on a qeth device, the device
learns the source MAC addresses of outgoing packets and incoming
packets to those learned MAC addresses are received.
* Takeover setvmac VNICC: If takeover setvmac is configured on a qeth
device, the MAC address of this device can be configured on a
different qeth device with the setvmac IPA command.
* Takeover by learning VNICC: If takeover learning is enabled on a qeth
device, the MAC address of this device can be learned (learning VNICC)
on a different qeth device.
* BridgePort invisible VNICC: If BridgePort invisible is enabled on a
qeth device, (1) packets from this device are not sent to a BridgePort
enabled qeth device and (2) packets coming from a BridgePort enabled
qeth device are not received by this device.
* Receive broadcast VNICC: Receive broadcast allows configuring if a
qeth device receives packets with the broadcast destination MAC
address.
This patch adds support for the IPA commands that are required to enable
and disable these VNIC characteristics on qeth devices. As a
prerequisite, it also adds the query commands IPA command.
The query commands IPA command allows requesting the supported commands
for each characteristic from the underlying hardware.
Additionally, this patch provides users with a sysfs user interface to
enable/disable the VNICCs mentioned above.
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hans Wippel [Mon, 18 Sep 2017 19:18:14 +0000 (21:18 +0200)]
s390/qeth: add basic VNICC support
VNIC Characteristics (VNICC) are features of HiperSockets that define
how packets are handled by the underlying network hardware. For example,
if the VNICC flooding is configured on a qeth device, ethernet frames to
unknown destination MAC addresses are received.
Currently, there is support for seven VNICCs: flooding, multicast
flooding, receive broadcast, learning, takeover learning, takeover
setvmac, bridge invisible. Also, six IPA commands exist for configuring
VNICCs on a qeth device: query characteristics, query commands, enable
characteristic, disable characteristic, set timeout, get timeout.
This patch adds the basic code infrastructure for VNICC support to qeth.
It allows querying VNICC support from the underlying hardware. To this
end, it adds:
* basic message formats for IPA commands
* basic data structures
* basic error handling
* query characteristics IPA command support
The query characteristics IPA command allows requesting the currently
supported and currently enabled VNIC characteristics from the underlying
hardware.
Support for the other IPA commands and for the configuration of VNICCs
is added in follow-up patches together with the respective user
interface functions.
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: renesas-ravb: Add support for R8A77995 RAVB
Add a new compatible string for the R8A77995 (R-Car D3) RAVB.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Tue, 12 Sep 2017 20:02:08 +0000 (23:02 +0300)]
ravb: document R8A77970 bindings
R-Car V3M (R8A77970) SoC also has the R-Car gen3 compatible EtherAVB
device, so document the SoC specific bindings.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: realtek: add RTL8201F phy-id and functions
Add RTL8201F phy-id and the related functions to the driver.
The original patch is as follows:
https://patchwork.kernel.org/patch/2538341/
Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: realtek: rename RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT
This renames the definition of page select register from
RTL8211F_PAGE_SELECT to RTL821x_PAGE_SELECT to use it across models.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
dummy: declare dummy devices as enumerated devices
Dummy device name is enumerated by the kernel, let user space be aware
of the naming scheme used by dummy devices:
(visible in /sys/class/net/<iface>/name_assign_type).
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI updates from Richard Weinberger:
"Minor improvements"
* tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs:
UBI: Fix two typos in comments
ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
Merge branch 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.
2) Fix double-free in rmnet driver, from Dan Carpenter.
3) INET connection socket layer can double put request sockets, fix
from Eric Dumazet.
4) Don't match collect metadata-mode tunnels if the device is down,
from Haishuang Yan.
5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
be2net driver, from Suresh Reddy.
6) Fix scaling error in gen_estimator, from Eric Dumazet.
7) Fix 64-bit statistics deadlock in systemport driver, from Florian
Fainelli.
8) Fix use-after-free in sctp_sock_dump, from Xin Long.
9) Reject invalid BPF_END instructions in verifier, from Edward Cree.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
Documentation: link in networking docs
tcp: fix data delivery rate
bpf/verifier: reject BPF_ALU64|BPF_END
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp: fix an use-after-free issue in sctp_sock_dump
netvsc: increase default receive buffer size
tcp: update skb->skb_mstamp more carefully
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
net: smsc911x: Quieten netif during suspend
net: systemport: Fix 64-bit stats deadlock
net: vrf: avoid gcc-4.6 warning
qed: remove unnecessary call to memset
tg3: clean up redundant initialization of tnapi
tls: make tls_sw_free_resources static
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
MAINTAINERS: review Renesas DT bindings as well
net_sched: gen_estimator: fix scaling error in bytes/packets samples
nfp: wait for the NSP resource to appear on boot
nfp: wait for board state before talking to the NSP
...
Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes: 65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Andreas Rammhold <andreas@rammhold.de> Reported-by: Florian Klink <flokli@flokli.de> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 15 Sep 2017 23:47:42 +0000 (16:47 -0700)]
tcp: fix data delivery rate
Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.
Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>