Alexander Duyck [Thu, 2 Mar 2017 23:01:36 +0000 (15:01 -0800)]
ixgbe: Fix output from ixgbe_dump
I just found that when we had changed the Rx path to check for length
instead of the DD bit we introduced an issue in ixgbe_dump since we were no
longer clearing the status bits.
To correct this I am updating ixgbe_dump to look for the length bits in the
descriptor since that is what we are using in the Rx path.
Fixes: c3630cc40b4f ("ixgbe: Use length to determine if descriptor is done") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 2 Mar 2017 23:01:05 +0000 (15:01 -0800)]
ixgbe: Add support for maximum headroom when using build_skb
This patch increases the headroom allocated when using build_skb on a
system with 4K pages. Specifically the breakdown of headroom versus cache
size is as follows:
L1 Cache Size Headroom
64 192
64, NET_IP_ALIGN == 2 194
128 128
128, NET_IP_ALIGN == 2 130
256 512
256, NET_IP_ALIGN == 2 258
I stopped at supporting only a cache line size of 256 as that was the
largest cache size I could find supported in the kernel.
With this we are guaranteeing at least 128 bytes of headroom to spare in
the frame. This should be enough for us to insert a couple of IPv6 headers
if needed which is likely enough room for anything XDP should need.
I'm leaving the padding for systems with pages larger than 4K unmodified
for now. XDP currently isn't really setup to work on those types of
systems so we can cross that bridge when we get there.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 1 Mar 2017 19:52:09 +0000 (11:52 -0800)]
ixgbe: add check for VETO bit when configuring link for KR
We did not have a check in place for MMNGC.MNG_VETO when setting up link
on X550EM_X KR devices which resulted in link loss for the BMC when
loading the driver.
This patch adds a check for ixgbe_check_reset_blocked() in setup_link()
since in that case there is no PHY reset function.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Philippe Reynes [Tue, 7 Feb 2017 15:56:33 +0000 (16:56 +0100)]
ixgbevf: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Thu, 2 Feb 2017 19:38:46 +0000 (14:38 -0500)]
ixgbe: Remove unused define
Remove the Marvell 1145 PHY define as we have never had a device that
supports it and have no plan to in the future. The existence of this
define has caused confusing on whether or not this PHY was supported
by ixgbe.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 20 Jan 2017 22:11:56 +0000 (14:11 -0800)]
ixgbe: do not use adapter->num_vfs when setting VFs via module parameter
Avoid setting adapter->num_vfs early in the init code path when
using the max_vfs module parameter by passing it to ixgbe_enable_sriov()
as a function parameter.
This fixes an issue where if we failed to allocate vfinfo in
__ixgbe_enable_sriov() the driver will crash with NULL pointer in
ixgbe_disable_sriov() when attempting to free the vfinfo struct based
on adapter->num_vfs. Also it cleans up the assignment of adapter->num_vfs
since now it will only be set in __ixgbe_enable_sriov() and cleared in
ixgbe_disable_sriov().
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 20 Jan 2017 22:11:50 +0000 (14:11 -0800)]
ixgbe: return early instead of wrap block in if statement
Since we exit at the end of the block, we can save a level of
indentation by performing an early return, and make the next several
sections of code more legible, with fewer 80 character line breaks.
Also moved allocating vfinfo at the beginning and the notification
for enabling SRIOV at the end of the function when we know that it
will succeed.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Fri, 20 Jan 2017 22:11:45 +0000 (14:11 -0800)]
ixgbe: move num_vfs_macvlans allocation into separate function
Move the code allocating memory for list of MAC addresses that
the VFs can use for MACVLAN into its own function.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 19 Jan 2017 23:55:12 +0000 (15:55 -0800)]
ixgbe: add default setup_link for x550em_a MAC type
Add default setting for mac->ops.setup_link on x550em_a MAC types.
This fixes a link issue on KR parts.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Sat, 31 Dec 2016 02:07:58 +0000 (21:07 -0500)]
ixgbe: Add X552 XFI backplane support
This patch add support for X552 XFI backplane interface. The XFI
backplane requires a custom tuned link. HW/FW owns the link config
for XF backplane and SW must not interfere with it.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Joe Perches [Tue, 3 Jan 2017 15:28:11 +0000 (07:28 -0800)]
ixgbe: Remove pr_cont uses
As pr_cont output can be interleaved by other processes,
using pr_cont should be avoided where possible.
Miscellanea:
- Use a temporary pointer to hold the next descriptions and
consolidate the pr_cont uses
- Use the temporary buffer to hold the 8 u32 register values and
emit those in a single go
- Coalesce formats and logging neatening around those changes
- Fix a defective output for the rx ring entry description when
also emitting rx_buffer_info data
This reduces overall object size a tiny bit too.
$ size drivers/net/ethernet/intel/ixgbe/*.o*
text data bss dec hex filename
62167 728 12 62907 f5bb drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.new
62273 728 12 63013 f625 drivers/net/ethernet/intel/ixgbe/ixgbe_main.o.old
Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Tue, 18 Apr 2017 18:11:10 +0000 (14:11 -0400)]
Merge branch 'ftgmac100-batch5-features'
Benjamin Herrenschmidt says:
====================
ftgmac100: Rework batch 5 - Features
This is the third spin of the fifth and last batch of
updates to the ftgmac100 driver.
This contains a few additional "features" such as:
- Support for ethtool n-way reset
- Multicast filtering & promisc support
- Vlan offload
- netpoll
And a couple of misc bits. This also adds the device-tree binding
documentation.
v2. - Addresses review comments and adds a new patch fixing a
theorical ordering issue in my new NAPI poll implementation
- Add a bug fix (Patch 8/9) for a potential ordering issue
in the new NAPI poll code.
v3. - Rebase on net-next (fix conflict with an unrelated #include
change series)
- Update DT bindings better describing accepted phy-mode values
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ftgmac100: Fix potential ordering issue in NAPI poll
We need to ensure the loads from the descriptor are done after the
MMIO store clearing the interrupts has completed, otherwise we
might still miss work.
A read back from the MMIO register will "push" the posted store and
ioread32 has a barrier on weakly aordered architectures that will
order subsequent accesses.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ftgmac100: Add pause frames configuration and support
Hopefully my understanding of how the hardware works is correct,
as the documentation isn't completely clear. So far I have seen
no obvious issue. Pause seem to also work with NC-SI.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 12:32:14 +0000 (14:32 +0200)]
net: pxa168_eth: Use kcalloc() in two functions
Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 08:52:02 +0000 (10:52 +0200)]
net: mvpp2: Fix a jump label position in mvpp2_rx()
The script "checkpatch.pl" pointed out that labels should not be indented.
Thus delete two horizontal tabs before the jump label "err_drop_frame"
in the function "mvpp2_rx".
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 08:40:32 +0000 (10:40 +0200)]
net: mvpp2: Improve a size determination in two functions
Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 08:30:29 +0000 (10:30 +0200)]
net: mvpp2: Improve 27 size determinations
Replace the specification of data structures by references to
a local variable as the parameter for the operator "sizeof"
to make the corresponding size determination a bit safer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 07:12:34 +0000 (09:12 +0200)]
net: mvpp2: Improve another size determination in mvpp2_prs_default_init()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 07:06:33 +0000 (09:06 +0200)]
net: mvpp2: Improve another size determination in mvpp2_bm_init()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 06:55:42 +0000 (08:55 +0200)]
net: mvpp2: Improve another size determination in mvpp2_port_probe()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 06:48:23 +0000 (08:48 +0200)]
net: mvpp2: Improve another size determination in mvpp2_init()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 06:38:32 +0000 (08:38 +0200)]
net: mvpp2: Improve two size determinations in mvpp2_probe()
Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Mon, 17 Apr 2017 06:09:07 +0000 (08:09 +0200)]
net: mvpp2: Use kmalloc_array() in mvpp2_txq_init()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Sun, 16 Apr 2017 20:11:22 +0000 (22:11 +0200)]
net: mvneta: Use kmalloc_array() in mvneta_txq_init()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Sun, 16 Apr 2017 19:45:38 +0000 (21:45 +0200)]
net: mvneta: Improve two size determinations in mvneta_init()
Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Sun, 16 Apr 2017 19:23:19 +0000 (21:23 +0200)]
net: mvneta: Use devm_kmalloc_array() in mvneta_init()
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "devm_kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 83e7e4ce9e93c3 ("mac80211: Use rhltable instead of rhashtable")
removed the last user that made use of 'insecure_elasticity' parameter,
i.e. the default of 16 is used everywhere.
Replace it with a constant.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 15 Apr 2017 14:00:29 +0000 (22:00 +0800)]
sctp: process duplicated strreset asoc request correctly
This patch is to fix the replay attack issue for strreset asoc requests.
When a duplicated strreset asoc request is received, reply it with bad
seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
But note that if the result saved in asoc is performed, the sender's next
tsn and receiver's next tsn for the response chunk should be set. It's
safe to get them from asoc. Because if it's changed, which means the peer
has received the response already, the new response with wrong tsn won't
be accepted by peer.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 15 Apr 2017 14:00:28 +0000 (22:00 +0800)]
sctp: process duplicated strreset in and addstrm in requests correctly
This patch is to fix the replay attack issue for strreset and addstrm in
requests.
When a duplicated strreset in or addstrm in request is received, reply it
with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with
the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
For strreset in or addstrm in request, if the receiver side processes it
successfully, a strreset out or addstrm out request(as a response for that
request) will be sent back to peer. reconf_time will retransmit the out
request even if it's lost.
So when receiving a duplicated strreset in or addstrm in request and it's
result was performed, it shouldn't reply this request, but drop it instead.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 15 Apr 2017 14:00:27 +0000 (22:00 +0800)]
sctp: process duplicated strreset out and addstrm out requests correctly
Now sctp stream reconf will process a request again even if it's seqno is
less than asoc->strreset_inseq.
If one request has been done successfully and some data chunks have been
accepted and then a duplicated strreset out request comes, the streamin's
ssn will be cleared. It will cause that stream will never receive chunks
any more because of unsynchronized ssn. It allows a replay attack.
A similar issue also exists when processing addstrm out requests. It will
cause more extra streams being added.
This patch is to fix it by saving the last 2 results into asoc. When a
duplicated strreset out or addstrm out request is received, reply it with
bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.
Note that it saves last 2 results instead of only last 1 result, because
two requests can be sent together in one chunk.
And note that when receiving a duplicated request, the receiver side will
still reply it even if the peer has received the response. It's safe, As
the response will be dropped by the peer.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chonggang Li [Sun, 16 Apr 2017 19:02:18 +0000 (12:02 -0700)]
bonding: deliver link-local packets with skb->dev set to link that packets arrived on
Bonding driver changes the skb->dev to the bonding-master before
passing the packet to stack for further processing. This, however
does not make sense for the link-local packets and it loses "the
link info" once its skb->dev is changed to bonding-master. This
patch changes this behavior for link-local packets by not changing
the skb->dev to the bonding-master and maintaining it as it is,
i.e. the link on which the packet arrived.
Signed-off-by: Chonggang Li <chonggangli@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 16 Apr 2017 16:48:24 +0000 (09:48 -0700)]
net: rtnetlink: plumb extended ack to doit function
Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.
This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Sun, 16 Apr 2017 10:27:14 +0000 (12:27 +0200)]
ipv6: sr: fix BUG due to headroom too small after SRH push
When a locally generated packet receives an SRH with two or more segments,
the remaining headroom is too small to push an ethernet header. This patch
ensures that the headroom is large enough after SRH push.
Fixes: 19d5a26f5ef8de5dcb78799feaf404d717b1aac3 ("ipv6: sr: expand skb head only if necessary") Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Sun, 16 Apr 2017 08:00:07 +0000 (11:00 +0300)]
gso: Validate assumption of frag_list segementation
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 15 Apr 2017 13:56:57 +0000 (21:56 +0800)]
sctp: get list_of_streams of strreset outreq earlier
Now when processing strreset out responses, it gets outreq->list_of_streams
only when result is performed. But if result is not performed, str_p will
be NULL. It will cause panic in sctp_ulpevent_make_stream_reset_event if
nums is not 0.
This patch is to fix it by getting outreq->list_of_streams earlier, and
also to improve some codes for the strreset inreq process.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add uid and cookie bpf helper to cg_skb_func_proto
BPF helper functions get_socket_cookie and get_socket_uid can be
used for network traffic classifications, among others. Expose
them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
commit 8f917bba0042 ("bpf: pass sk to helper functions") the
required skb->sk function is available at both cgroup bpf ingress
and egress hooks. With these two new helper, cg_skb_func_proto is
effectively the same as sk_filter_func_proto.
Change since V1:
Instead of add the helper to cg_skb_func_proto, redirect the
cg_skb_func_proto to sk_filter_func_proto since all helper function
in sk_filter_func_proto are applicable to cg_skb_func_proto now.
Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The statistics functionis called with RTNL held during probe
but with RCU held during access from /proc and elsewhere.
This is safe so update the lockdep annotation.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 14 Apr 2017 19:10:41 +0000 (22:10 +0300)]
net: phy: test the right variable in phy_write_mmd()
This is a copy and paste buglet. We meant to test for ->write_mmd but
we test for ->read_mmd.
Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD accessors") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
kernel.
- Many fixes to 6LoWPAN, in particular for BLE
- New CA8210 IEEE 802.15.4 device driver (accounting for most of the
lines of code added in this pull request)
- Added Nokia Bluetooth (UART) HCI driver
- Some serdev & TTY changes that are dependencies for the Nokia
driver (with acks from relevant maintainers and an agreement that
these come through the bluetooth tree)
- Support for new Intel Bluetooth device
- Various other minor cleanups/fixes here and there
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:30 +0000 (10:30 -0700)]
bpf: lru: Add map-in-map LRU example
This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.
It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.
Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map. 8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:29 +0000 (10:30 -0700)]
bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}' 19644783 (19M/s)
> map_perf_test 32 8 1000000010000000 | awk '{sum += $3}END{print sum}' 6283930 (6.28M/s)
By inactive, it usually means the element is not in cache. Hence,
there is a need to tune the PERCPU_NR_SCANS value.
This patch finds a better number of elements to
scan during each list rotation. The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements. This change only
affects the BPF_F_NO_COMMON_LRU map.
The test_lru_dist does not show meaningful difference
between 16 and 4. Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate. Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.
After this change:
> map_perf_test 32 8 1000000010000000 | awk '{sum += $3}END{print sum}' 9240324 (9.2M/s)
i.e. 6.28M/s -> 9.2M/s
The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:28 +0000 (10:30 -0700)]
bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
The current bpf_map_def is statically defined during compile
time. This patch allows the *_user.c program to change it during
runtime. It is done by adding load_bpf_file_fixup_map() which
takes a callback. The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.
The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However, it is hard to find one size to fit all testing
environment. Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.
This patch adds two cmdline args. One is to configure
the map's max_entries. Another is to configure the max_cnt
which controls how many times a syscall is called.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:27 +0000 (10:30 -0700)]
bpf: lru: Refactor LRU map tests in map_perf_test
One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.
One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:26 +0000 (10:30 -0700)]
bpf: lru: Cleanup test_lru_map.c
This patch does the following cleanup on test_lru_map.c
1) Fix indentation (Replace spaces by tabs)
2) Remove redundant BPF_F_NO_COMMON_LRU test
3) Simplify some comments
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:25 +0000 (10:30 -0700)]
bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU
test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU.
It just happens to work when PERCPU_FREE_TARGET == 16.
This patch:
1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU
2) Add test_lru_sanity6 to test list rotation for
the BPF_F_NO_COMMON_LRU map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvneta: fix failed to suspend if WOL is enabled
Recently, suspend/resume and WOL support are added into mvneta driver.
If we enable WOL, then we get some error as below on Marvell BG4CT
platforms during suspend:
Recently we added support for SW fdbs to take over HW ones, but that
results in changing a user-visible fdb flag thus we need to send a
notification, also it's consistent with how HW takes over SW entries.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
to find an index in the settings table. phy_find_setting() starts at
index 0, and scans upwards looking for an exact speed and duplex match.
When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is
10baseT-Half duplex.
phy_find_valid() then scans from the point (and effectively only checks
one entry) before bailing out, returning MAX_NUM_SETTINGS - 1.
phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to
DUPLEX_HALF whether or not 10baseT-Half is supported or not. This goes
against all the comments against these functions, and 10baseT-Half may
not even be supported by the hardware.
Rework these functions, introducing a new method of scanning the table.
There are two modes of lookup that phylib wants: exact, and inexact.
- in exact mode, we return either an exact match or failure
- in inexact mode, we return an exact match if it exists, a match at
the highest speed that is not greater than the requested speed
(ignoring duplex), or failing that, the lowest supported speed, or
failure.
The biggest difference is that we always check whether the entry is
supported before further consideration, so all unsupported entries are
not considered as candidates.
This results in arguably saner behaviour, better matches the comments,
and is probably what users would expect.
This becomes important as ethernet speeds increase, PHYs exist which do
not support the 10Mbit speeds, and half-duplex is likely to become
obsolete - it's already not even an option on 10Gbit and faster links.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Since 3.12 it has been possible to configure the default queuing
discipline via sysctl. This patch adds ability to configure the
default queue discipline in kernel configuration. This is useful for
environments where configuring the value from userspace is difficult
to manage.
The default is still the same as before (pfifo_fast) and it is
possible to change after kernel init with sysctl. This is similar
to how TCP congestion control works.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for aRFS for TCP and UDP
protocols with IPv4/IPv6.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds necessary APIs to interface with
qede aRFS support in successive patch.
It also reserves separate PTT entry for aRFS,
[as being in fastpath flow] for hardware access instead of
trying to acquire it at run time from the ptt pool.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
smsc95xx: Add comments to the registers definition
This chip is used by a lot of embedded devices and also by the Raspberry
Pi 1, 2 & 3 which were created to promote the study of computer
sciences. Students wanting to learn kernel / network device driver
programming through those devices can only rely on the Linux kernel
driver source to make their own.
This commit adds a lot of comments to the registers definition to expand
the register names.
Cc: Steve Glendinning <steve.glendinning@shawell.net> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Martin Wetterwald <martin@wetterwald.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
R. Parameswaran [Thu, 13 Apr 2017 01:31:04 +0000 (18:31 -0700)]
l2tp: device MTU setup, tunnel socket needs a lock
The MTU overhead calculation in L2TP device set-up
merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
needs to be adjusted to lock the tunnel socket while
referencing the sub-data structures to derive the
socket's IP overhead.
Reported-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 12 Apr 2017 18:49:04 +0000 (11:49 -0700)]
net: ipv6: send unsolicited NA on admin up
ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is
set to 1, gratuitous arp requests are sent when the device is brought up.
The same is expected when ndisc_notify is set to 1 (per ndisc_notify in
Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP
event; add it.
Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Apr 2017 15:08:33 +0000 (11:08 -0400)]
Merge branch 'mlx5-RDMA-netdevice'
Saeed Mahameed says:
====================
Mellanox, mlx5 RDMA net device support
This series provides the lower level mlx5 support of RDMA netdevice
creation API [1] suggested and introduced by Intel's HFI OPA VNIC
netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation.
mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current
IPoIB ULP generic netdevice, providing:
- mlx5 RSS support.
- mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..).
- Full mlx5 HW features transparent to the ULP itself.
The idea here is to reuse and benefit from the already implemented mlx5e netdevice
management and channels API for both etherent and RDMA netdevices, since both IPoIB
and Ethernet netdevices share same common mlx5 HW resources (with some small
exceptions) and share most of the control/data path logic, it is more natural to
have them share the same code.
The differences between IPoIB and Ethernet netdevices can be summarized to:
Steering:
In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet
the traffic is handled by vports and vport steering is managed by e-switch or FW.
For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS
HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to
them (underlay QP will be 0 in case of Ethernet).
RX,TX:
Since IPoIB traffic is different, slightly modified RX and TX handlers are required,
still we do some code reuse in data path via common helper functions.
All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet
and IPoIB netdevices, e.g.
- Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..)
- Offloads, checksum, GRO, LRO, TSO, and more.
- netdevice logic and non Ethernet specific ndos (open/close, etc..)
In order to achieve what we want:
In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored
the mlx5 steering code to accept the underlay QP as a parameter for creating steering
objects and enabled flow steering for IB link.
Then we are going to use the mlx5e netdevice profile, which is already used to separate between
NIC and VF representors netdevices, to create new type of IPoIB netdevice profile.
For that, one small refactoring is required to make mlx5e netdevice profile management
more genetic and agnostic to link type which is done in patch #4.
In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a
skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches,
using mlx5e already existing APIs.
Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS
resources, same as mlx5e but without vlan and L2 steering tables.
Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources
same as mlx5e but with one TC (tc = 0) support.
Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels.
Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts.
Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the
mlx5i IPoIB xmit routine.
Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series
in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make
the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers.
Patch #15, Small cleanup to avoid e-switch with IPoIB netdev.
In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3]
- which was alread submitted to the rdma mailing list - and this series is required
plus an extra small patch [4] which will connect between both sides and actually enables the offload.
Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable
the feature.
Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx5e: E-switch vport manager is valid for ethernet only
Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.
This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS
and TX HW context to perform on IPoIB traffic.
Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going
through this QP when the ULP IPoIB decides to cleanup.
Implement attach/detach mcast RDMA netdev callbacks for later RDMA
netdev use.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement open/close of IPoIB netdevice ndos using mlx5e's
channels API to manage data path resources (RQs/SQs/CQs).
Set IPoIB netdev address on dev_init ndo.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Modify mlx5e tis creation function to accept underlay qp number, which
will be needed by IPoIB.
Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to
create one TIS with the IPoIB underlay qp, for IPoIB TX SQs.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering
tables, but IPoIB do not require MAC and VLAN steering tables so the
only tables we create in here are:
1. TTC Table (Traffic Type Classifier table for RSS steering)
2. ARFS Table (for accelerated RFS support)
Creation of those tables is identical to mlx5e ethernet mode, hence the
use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation,
All we do here is simply reuse the mlx5e implementation to create
direct and indirect (RSS) steering HW objects.
For that we just expose
mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h
and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile
callbacks.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c
file with empty implementation.
Downstream patches will provide the full mlx5 rdma netdevice acceleration
support for IPoIB into this new file, by using the mlx5e netdevice
profile and new mlx5_channels APIs and infrastructures.
Same as already done in mlx5e NIC netdevice and switchdev mode VF
representors.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for mlx5e RDMA net_device support, here we generalize
mlx5e_attach/detach in a way that those functions will be agnostic
to link type. For that we move ethernet specific NIC net device logic out
of those functions into {nic,rep}_{enable/disable} mlx5e NIC and
representor profiles callbacks.
Also some of the logic was moved only to NIC profile since it is not right
to have this logic for representor net device (e.g. set port MTU).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>