attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/sysfs.h> work
with const attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/sysfs.h> work
with const attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/sysfs.h> work
with const attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/sysfs.h> work
with const attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/netdevice.h> work
with const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
11800 368 0 12168 2f88 drivers/net/can/janz-ican3.o
File size After adding 'const':
text data bss dec hex filename
11864 304 0 12168 2f88 drivers/net/can/janz-ican3.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/netdevice.h> work
with const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
6164 304 0 6468 1944 drivers/net/can/at91_can.o
File size After adding 'const':
text data bss dec hex filename
6228 240 0 6468 1944 drivers/net/can/at91_can.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
attribute_group are not supposed to change at runtime. All functions
working with attribute_group provided by <linux/netdevice.h> work
with const attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
13275 928 1 14204 377c drivers/net/usb/cdc_ncm.o
File size After adding 'const':
text data bss dec hex filename
13339 864 1 14204 377c drivers/net/usb/cdc_ncm.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Preparations for IPv6 UC router
Ido says:
The purpose of this set is to prepare the driver for the introduction of
IPv6 FIB offload. It's mainly composed of small and non-functional
changes, that either add the IPv6 equivalent of existing IPv4 code or
aimed at making the introduction of IPv6-specific code easier.
The first five patches enable IPv6 forwarding in the device and allow us
to configure router interfaces (RIFs) based on inet6addr notifications.
The next six patches add support for programming IPv6 neighbours into
the device's table as well as dumping their activity and updating the
kernel accordingly.
The last 11 patches extend current infrastructure to allow us to program
IPv6 routes, set catch-all IPv6 trap in case of abort and make the code
more receptive towards up-coming changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Update prefix count for IPv6
The number of possible prefix lengths for IPv6 is 129 and not 128.
Fixes following warning from UBSAN when /128 routes are offloaded:
UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2510:27 index 128 is out
of range for type 'long unsigned int [128]'
Fixes: 638a7b12eb3e ("mlxsw: spectrum_router: Implement private fib") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Mark IPv4 specific function accordingly
The functions to create and destroy a nexthop group are IPv4 specific
and should be renamed accordingly, so that they won't be confused with
the IPv6 specific functions in follow-up patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When we fail to insert a route we invoke the abort mechanism which
flushes all the tables and inserts a default route in each, so that all
packets incoming to the router will be trapped to the CPU.
Upon abort, add an IPv6 default route to the IPv6 tables.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Allow IPv6 routes to be programmed
Take advantage of previous patch and allow the RALUE register to be
called with IPv6 routes.
In order to re-use as much code as possible between IPv4 and IPv6, only
the lowest-level function that actually does the register packing is
demuxed based on the passed protocol.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Make FIB node retrieval family agnostic
A FIB node is an entity which stores routes sharing the same prefix and
length. The data structure itself is already family agnostic, but we
make some of its operations agnostic as well and thus re-use them for
IPv6 offload.
Instead of passing an IPv4-specific structure to fib4_node_get(), pass
general routing parameters and rename the function accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Don't assume neighbour type
Thankfully, the neighbour subsystem is agnostic to the upper protocol
and used by both IPv4 and IPv6. By removing assumptions regarding the
neighbour type we can thus re-use much of the neighbour-related code for
both IPv4 and IPv6.
For each nexthop, store its gateway IP and for nexthop group store the
neighbour table used by its nexthops.
Use this information throughout the code and remove assumption about the
neighbour type.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Set activity interval according to both neighbour tables
The neighbours' activity is currently dumped according to the ARP
table's DELAY_PROBE time, but with the introduction of IPv6 offload we
should set the interval according to the minimum between the ARP and
ndisc tables.
Signed-off-by: Arkadi Sharshvesky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Reflect IPv6 neighbours to the device
As with IPv4, listen to NEIGH_UPDATE events from the ndisc table and
program relevant neighbours to the device's neighbour table.
Note that neighbours with a link-local IP address aren't programmed, as
packets with a link-local destination IP are trapped after LPM lookup
and never reach the neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Configure RIFs based on IPv6 addresses
When a netdev is configured with an IP address a router interface (RIF)
should be configured for it in the device. Allow configuration of RIFs
based on IPv6 address notifications as well as IPv4.
Note that the RIF exists as long as an IP address is configured on the
netdev, regardless of the address family.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Flood unregistered multicast packets to router
Up until now we only flooded broadcast packets to the router when an L3
interface was configured on top of a bridge. However, IPv6 Neighbour
Discovery packets are trapped to the CPU inside the router and these can
be sent with a multicast address.
Flood unregistered multicast packets to the router port, so that
relevant packets could be trapped there.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Before we can start using IPv6, we need to trap certain control packets
to the CPU. Among others, these include Neighbour Discovery, DHCP and
neighbour misses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Jul 2017 18:13:42 +0000 (11:13 -0700)]
Merge branch 'xfrm-remove-flow-cache'
Florian Westphal says:
====================
xfrm: remove flow cache
After RCU-ification of ipsec packet path there are no major scalability
issues anymore without flow cache.
We still incur a performance hit, which comes mostly from the extra xfrm
dst allocation/freeing.
The last patch in the series adds a simple percpu cache to avoid the
extra allocation if a packet matched the same policies as last one.
The main concern with this is that we will see performance drops,
especially with large numbers of policies/SAs.
However, during hallway discussions at nfws 2017 it seemed the issues
with flow caching outweight the removal downsides, and that it
might be best to just 'remove it' and see where the practical issues
(if any) will appear.
It should now be possible to also remove the genid member in the policies
as we don't hold bundles for prolonged time anymore, but I think
this change is controversial (and intrusive) enough as-is, so defer
that to a later point in time.
Changes since last rfc:
- fix build failures due to implicit interrupt.h includes
- rework last patch (pcpu cache):
* avoid xchg()
* check policies for walk.dead = 1 instead of more costly bundle_ok().
* flush pcpu bundles when sa/policies get removed, to allow module
references to go away (suggested by Ilan Tayari)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.
The cache will not help with strict RR workloads as there is no hit.
The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.
We need to run the dst_release on the correct cpu to avoid races with
packet path. This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().
Test results using 4 network namespaces and null encryption:
Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
these drivers use tasklets or irq apis, but don't include interrupt.h.
Once flow cache is removed the implicit interrupt.h inclusion goes away
which will break the build.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series removes the remaining capabilities as well as the
flags bitmap in the info structures. Most of them are turned into ops,
or new info members.
There is no mv88e6xxx_cap enum or bitmap flags anymore, only
mv88e6xxx_info and mv88e6xxx_ops structures.
While reviewing and documenting the related G2 registers, fix a few
inconsistencies: 88E6185 has no interrupt in G2 and 88E6390 has a POT.
Except these two adjustments, there is no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:46 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add a multi_chip info flag
Instead of relying on a bitmap flag, add a new multi_chip info flag to
describe the presence of the indirect SMI access though the two device
registers 0x0 and 0x1.
All remaining capabilities and flags are now unused. Remove the
mv88e6xxx_cap enum and the info flags bitmaps.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:45 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add Energy Detect ops
The 88E6352 family supports Energy Detect and has one bit for Sense and
one bit for periodically transmit NLP (Energy Detect+TM). The 88E6390
family adds another bit to distinguish Auto or SW wake-up. Chips
supporting EEE all have an EEE Enabled bit in the Port Status Register.
This patch adds new ops for the PHY Energy Detect accesses.
This also allows us to get rid of the MV88E6XXX_FLAG_EEE flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:44 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add a global2_addr info flag
Similarly to global1_addr, add a global2_addr member in the info
structure to describe the presence of the Global 2 Registers.
This allows us to get rid of the MV88E6XXX_FLAG_GLOBAL2 flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:43 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add POT operation
Add a pot_clear operation to clear the Priority Override Table and wrap
its call into a mv88e6xxx_pot_setup helper.
This allows us to get rid of the MV88E6XXX_FLAG_G2_POT flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6390 family clear the Priority Override Table the same way as 88E6352, thus add MV88E6XXX_FLAG_G2_POT to MV88E6XXX_FLAGS_FAMILY_6390.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:41 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU
The 88E6185 family only has one 16-bit register to mark the 16 802.1D
reserved multicast addresses in the range of 01:80:C2:00:00:0x as MGMT.
The 88E6352 family also has one 16-bit register to mark the 16 GARP
reserved multicast addresses in the range of 01:80:C2:00:00:2x as MGMT.
Split the existing mv88e6095 prefixed mgmt_rsvd2cpu operation into two
distinct mv88e6185 and mv88e6352 prefixed operations, and wrap its call
into a mv88e6xxx_rsvd2cpu_setup helper.
This allows us to also get rid of the MV88E6XXX_CAP_G2_MGMT_EN_* flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:40 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add number of Global 2 IRQs
Similarly to g1_irqs, add a g2_irqs member to the info structure to
indicates the presence of the Global 2 Interrupt Source and Mask
registers.
At the same time, provide helpers and document the registers since they
differ a bit between 88E6352 and 88E6390 families.
This allows us to get rid of the MV88E6XXX_FLAG_G2_INT flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6185 family has no Global 2 Interrupt Source or Mask registers.
Remove the MV88E6XXX_FLAG_G2_INT from MV88E6XXX_FLAGS_FAMILY_6185.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:38 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: remove unused capabilities
Remove the forgotten capabilities and related flags from previous
cleanups.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6XXX_FAMILY_6321 is undefined, 88E6321's family is 88E6320,
fix this.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:36 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: remove LED control register
We don't support LED control yet, remove its register definition.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 17 Jul 2017 17:03:35 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: remove unneeded dsa header
phy.c does not need to include the DSA public header. Remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Tue, 18 Jul 2017 04:56:48 +0000 (21:56 -0700)]
net: fix build error in devmap helper calls
Initial patches missed case with CONFIG_BPF_SYSCALL not set.
Fixes: 85d93df2d36f ("xdp: Add batching support to redirect map") Fixes: e3e70468e656 ("bpf: add bpf_redirect_map helper routine") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
5113 384 0 5497 1579 drivers/net/ethernet/ec_bhf.o
File size After adding 'const':
text data bss dec hex filename
5177 320 0 5497 1579 drivers/net/ethernet/ec_bhf.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
791 336 0 1127 467 net/ethernet/cadence/macb_pci.o
File size After adding 'const':
text data bss dec hex filename
855 272 0 1127 467 net/ethernet/cadence/macb_pci.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Revert "net: add function to allocate sk_buff head without data area"
It was added for netlink mmap tx, there are no callers in the tree.
The commit also added a check for skb->head != NULL in kfree_skb path,
remove that too -- all skbs ought to have skb->head set.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jul 2017 16:48:07 +0000 (09:48 -0700)]
Merge branch 'xdp-redirect'
John Fastabend says:
====================
Implement XDP bpf_redirect
This series adds two new XDP helper routines bpf_redirect() and
bpf_redirect_map(). The first variant bpf_redirect() is meant
to be used the same way it is currently being used by the cls_bpf
classifier. An xdp packet will be redirected immediately when this
is called.
The other variant bpf_redirect_map(map, key, flags) uses a new
map type called devmap. A devmap uses integers as keys and
net_devices as values. The user provies key/ifindex pairs to
update the map with new net_devices. This provides two benefits
over the normal variant 'bpf_redirect()'. First the datapath
bpf program is abstracted away from using hard-coded ifindex
values. Allowing a single bpf program to be run any many different
environments. Second, and perhaps more important, the map enables
batching packet transmits. The map plus small driver changes
allows for batching all send requests across a NAPI poll loop.
This allows driver writers to optimize the driver xmit path
and only call expensive operations once for a batch of xdp_buffs.
The devmap was designed to support possible future work for
multicast and broadcast as follow-up patches.
To see, in more detail, how to leverage the new helpers and
map from the userspace side please review these two patches,
xdp: sample program for new bpf_redirect helper
xdp: bpf redirect with map sample program
Performance numbers provided by Jesper are the following, tested
using the ixgbe driver with CPU E5-1650 v4 @ 3.60GHz:
13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX with swap mac (writes into pkt)
7,596,576 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
A big thanks to everyone who helped with this series. Jesper
provided fixes, debugging, code review, performance benchmarks!
Daniel provided lots of useful feedback and code review. And last
but not least Andy provided useful feedback related to supporting
additional drivers, generic xdp implementation, testing, etc. Any
other feedback is welcome but I believe at this point these are
ready to be merged!
Whats left... get the rest of the drivers developers to implement
this in all the drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:30:25 +0000 (09:30 -0700)]
xdp: bpf redirect with map sample program
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:30:02 +0000 (09:30 -0700)]
net: add notifier hooks for devmap bpf map
The BPF map devmap holds a refcnt on the net_device structure when
it is in the map. We need to do this to ensure on driver unload we
don't lose a dev reference.
However, its not very convenient to have to manually unload the map
when destroying a net device so add notifier handlers to do the cleanup
automatically. But this creates a race between update/destroy BPF
syscall and programs and the unregister netdev hook.
Unfortunately, the best I could come up with is either to live with
requiring manual removal of net devices from the map before removing
the net device OR to add a mutex in devmap to ensure the map is not
modified while we are removing a device. The fallout also requires
that BPF programs no longer update/delete the map from the BPF program
side because the mutex may sleep and this can not be done from inside
an rcu critical section. This is not a real problem though because I
have not come up with any use cases where this is actually useful in
practice. If/when we come up with a compelling user for this we may
need to revisit this.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:29:40 +0000 (09:29 -0700)]
xdp: Add batching support to redirect map
For performance reasons we want to avoid updating the tail pointer in
the driver tx ring as much as possible. To accomplish this we add
batching support to the redirect path in XDP.
This adds another ndo op "xdp_flush" that is used to inform the driver
that it should bump the tail pointer on the TX ring.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:29:18 +0000 (09:29 -0700)]
bpf: add bpf_redirect_map helper routine
BPF programs can use the devmap with a bpf_redirect_map() helper
routine to forward packets to netdevice in map.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:28:56 +0000 (09:28 -0700)]
bpf: add devmap, a map for storing net device references
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.
The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.
Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:28:35 +0000 (09:28 -0700)]
xdp: add trace event for xdp redirect
This adds a trace event for xdp redirect which may help when debugging
XDP programs that use redirect bpf commands.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:28:12 +0000 (09:28 -0700)]
ixgbe: add initial support for xdp redirect
There are optimizations we can add after the basic feature is
enabled. But, for now keep the patch simple.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:27:50 +0000 (09:27 -0700)]
net: implement XDP_REDIRECT for xdp generic
Add support for redirect to xdp generic creating a fall back for
devices that do not yet have support and allowing test infrastructure
using veth pairs to be built.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:27:28 +0000 (09:27 -0700)]
xdp: sample program for new bpf_redirect helper
This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:27:07 +0000 (09:27 -0700)]
xdp: add bpf_redirect helper function
This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.
In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.
If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:26:45 +0000 (09:26 -0700)]
net: xdp: support xdp generic on virtual devices
XDP generic allows users to test XDP programs and/or run them with
degraded performance on devices that do not yet support XDP. For
testing I typically test eBPF programs using a set of veth devices.
This allows testing topologies that would otherwise be difficult to
setup especially in the early stages of development.
This patch adds a xdp generic hook to the netif_rx_internal()
function which is called from dev_forward_skb(). With this addition
attaching XDP programs to veth devices works as expected! Also I
noticed multiple drivers using netif_rx(). These devices will also
benefit and generic XDP will work for them as well.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Mon, 17 Jul 2017 16:26:24 +0000 (09:26 -0700)]
ixgbe: NULL xdp_tx rings on resource cleanup
tx_rings and rx_rings are cleaned up on close paths in ixgbe driver
however, xdp_rings are not. Set the xdp_rings to NULL here so that
we can use the pointer to indicate if the XDP rings are initialized.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jul 2017 16:19:40 +0000 (09:19 -0700)]
Merge branch 'mlxsw-traps'
Jiri Pirko says:
====================
mlxsw: Traps enhancements
Ido says:
The first patch makes sure the driver marks packets that were trapped
in the router and might have already been flooded by the bridge, so that
the bridge driver won't flood them again. This isn't critical at this time
point, but will be when Neighbour Discovery traps are introduced as these
are multicast packets that are trapped in the router.
The second and third patches add new traps - for MLD and Router Alert
packets. The last patch takes advantage of that and floods IPv6
unregistered multicast packets only to mrouter ports instead of all ports.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now IPv6 unregistered multicast traffic would be flooded like
broadcast, even when MLD snooping was enabled on the bridge. This was
intentional as MLD packet traps were missing, preventing the bridge
driver from programming MDB entries to the device.
Previous patch added these traps, so we can now finally flood IPv6
unregistered multicast packets to specific ports via the multicast table
instead of flooding them to all ports via the broadcast table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In commit fd33e22ad6a3 ("mlxsw: spectrum: Mirror certain packets to
CPU") we marked packets that were mirrored to the CPU, so that they
won't be flooded again by the bridge driver.
However, certain packets are trapped in the device's router block, after
passing through the bridge block where they were potentially flooded.
Mark all packets coming from L3 traps, so that they won't be potentially
flooded again by the bridge driver.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>