git.baikalelectronics.ru Git

net: tcp: check skb is non-NULL for exact match on lookups

Andrey reported the following error report while running the syzkaller
fuzzer:

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800398c4480 task.stack: ffff88003b468000
RIP: 0010:[<ffffffff83091106>]  [<     inline     >]
inet_exact_dif_match include/net/tcp.h:808
RIP: 0010:[<ffffffff83091106>]  [<ffffffff83091106>]
__inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
RSP: 0018:ffff88003b46f270  EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000004242 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffc90000e3c000 RDI: 0000000000000054
RBP: ffff88003b46f2d8 R08: 0000000000004000 R09: ffffffff830910e7
R10: 0000000000000000 R11: 000000000000000a R12: ffffffff867fa0c0
R13: 0000000000004242 R14: 0000000000000003 R15: dffffc0000000000
FS:  00007fb135881700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020cc3000 CR3: 000000006d56a000 CR4: 00000000000006f0
Stack:
0000000000000000 000000000601a8c0 0000000000000000 ffffffff00004242
424200003b9083c2 ffff88003def4041 ffffffff84e7e040 0000000000000246
ffff88003a0911c0 0000000000000000 ffff88003a091298 ffff88003b9083ae
Call Trace:
[<ffffffff831100f4>] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
[<ffffffff83115b1b>] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
[<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
...

MD5 has a code path that calls __inet_lookup_listener with a null skb,
so inet{6}_exact_dif_match needs to check skb against null before pulling
the flag.

Fixes: a2bd0ab1c119 ("net: Require exact match for TCP socket lookups if
       dif is l3mdev")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix potential memory corruption

Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.

Then max_skb_frags is lowered to 14 or smaller value.

tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.

Fixes: c08eb253e011 ("net:Add sysctl_max_skb_frags")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Correctly map aggregation replacement pages

Driver allocates replacement buffers before-hand to make
sure whenever an aggregation begins there would be a replacement
for the Rx buffers, as we can't release the buffer until
aggregation is terminated and driver logic assumes the Rx rings
are always full.

For every other Rx page that's being allocated [I.e., regular]
the page is being completely mapped while for the replacement
buffers only the first portion of the page is being mapped.
This means that:
a. Once replacement buffer replenishes the regular Rx ring,
assuming there's more than a single packet on page we'd post unmapped
memory toward HW [assuming mapping is actually done in granularity
smaller than page].
b. Unmaps are being done for the entire page, which is incorrect.

Fixes: bd51c9af2e5e0 ("qede: Add slowpath/fastpath support and enable hardware GRO")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: correct device ID of T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: fix sleeping inside inet_wait_for_connect()

Andrey reported this kernel warning:

  WARNING: CPU: 0 PID: 4608 at kernel/sched/core.c:7724
  __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
  do not call blocking ops when !TASK_RUNNING; state=1 set at
  [<ffffffff811f5a5c>] prepare_to_wait+0xbc/0x210
  kernel/sched/wait.c:178
  Modules linked in:
  CPU: 0 PID: 4608 Comm: syz-executor Not tainted 4.9.0-rc2+ #320
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
   ffff88006625f7a0 ffffffff81b46914 ffff88006625f818 0000000000000000
   ffffffff84052960 0000000000000000 ffff88006625f7e8 ffffffff81111237
   ffff88006aceac00 ffffffff00001e2c ffffed000cc4beff ffffffff84052960
  Call Trace:
   [<     inline     >] __dump_stack lib/dump_stack.c:15
   [<ffffffff81b46914>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
   [<ffffffff81111237>] __warn+0x1a7/0x1f0 kernel/panic.c:550
   [<ffffffff8111132c>] warn_slowpath_fmt+0xac/0xd0 kernel/panic.c:565
   [<ffffffff811922fc>] __might_sleep+0x14c/0x1a0 kernel/sched/core.c:7719
   [<     inline     >] slab_pre_alloc_hook mm/slab.h:393
   [<     inline     >] slab_alloc_node mm/slub.c:2634
   [<     inline     >] slab_alloc mm/slub.c:2716
   [<ffffffff81508da0>] __kmalloc_track_caller+0x150/0x2a0 mm/slub.c:4240
   [<ffffffff8146be14>] kmemdup+0x24/0x50 mm/util.c:113
   [<ffffffff8388b2cf>] dccp_feat_clone_sp_val.part.5+0x4f/0xe0 net/dccp/feat.c:374
   [<     inline     >] dccp_feat_clone_sp_val net/dccp/feat.c:1141
   [<     inline     >] dccp_feat_change_recv net/dccp/feat.c:1141
   [<ffffffff8388d491>] dccp_feat_parse_options+0xaa1/0x13d0 net/dccp/feat.c:1411
   [<ffffffff83894f01>] dccp_parse_options+0x721/0x1010 net/dccp/options.c:128
   [<ffffffff83891280>] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644
   [<ffffffff838b8a94>] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681
   [<     inline     >] sk_backlog_rcv ./include/net/sock.h:872
   [<ffffffff82b7ceb6>] __release_sock+0x126/0x3a0 net/core/sock.c:2044
   [<ffffffff82b7d189>] release_sock+0x59/0x1c0 net/core/sock.c:2502
   [<     inline     >] inet_wait_for_connect net/ipv4/af_inet.c:547
   [<ffffffff8316b2a2>] __inet_stream_connect+0x5d2/0xbb0 net/ipv4/af_inet.c:617
   [<ffffffff8316b8d5>] inet_stream_connect+0x55/0xa0 net/ipv4/af_inet.c:656
   [<ffffffff82b705e4>] SYSC_connect+0x244/0x2f0 net/socket.c:1533
   [<ffffffff82b72dd4>] SyS_connect+0x24/0x30 net/socket.c:1514
   [<ffffffff83fbf701>] entry_SYSCALL_64_fastpath+0x1f/0xc2
  arch/x86/entry/entry_64.S:209

Unlike commit a4f565bbf2e3c7131536a639f959513278524aad
("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the
sleeping function is called before schedule_timeout(), this is indeed
a bug. Fix this by moving the wait logic to the new API, it is similar
to commit 2404b47b282ccbc85cee2c63afba695295cefacf
("netdev, sched/wait: Fix sleeping inside wait event").

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netfront: cast grant table reference first to type int

IS_ERR_VALUE() in commit 79bb99af76b3787c649d693c61923168fb85a53b
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_udp_tunnel: remove unused IPCB related codes

Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are
never used before it reaches ip6tunnel_xmit(), and past that point the
control buffer is no longer interpreted as IPCB.

This clears these unused IPCB related codes. Currently there is no skb
scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be
cleared for IPv4 packets, as shown in e6eebf3aa5a
("tunnel: Clear IPCB(skb)->opt before dst_link_failure called").

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Update MELLANOX MLX5 core VPI driver maintainers

Add myself as a maintainer for mlx5 core driver as well.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: ensure coalesce settings survive read-modify-write

The coalesce settings behave badly when changing just one value:

... # ethtool -c eth0
rx-usecs: 249
... # ethtool -C eth0 tx-usecs 250
... # ethtool -c eth0
rx-usecs: 248

This occurs due to rounding errors when calculating the microseconds
value - the divisons round down. This causes (eg) the rx-usecs to
decrease by one every time the tx-usecs value is set as per the above.

Fix this by making the divison round-to-nearest.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Simplify a test

'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

unix: escape all null bytes in abstract unix domain socket

Abstract unix domain socket may embed null characters,
these should be translated to '@' when printed out to
proc the same way the null prefix is currently being
translated.

This helps for tools such as netstat, lsof and the proc
based implementation in ss to show all the significant
bytes of the name (instead of getting cut at the first
null occurrence).

Signed-off-by: Isaac Boukris <iboukris@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: use correct value for SGMII_LN_UCDR_SO_GAIN_MODE0

The documentation says that SGMII_LN_UCDR_SO_GAIN_MODE0 should be
set to 0, not 6, on the Qualcomm Technologies QDF2432.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'xgene-coalescing-bugs'

Iyappan Subramanian says:

====================
drivers: net: xgene: Fix coalescing bugs

This patch set fixes the following,

1. Since ethernet v1 hardware has a bug related to coalescing,
disabling this feature
2. Fixing ethernet v2 hardware, interrupt trigger region
id to 2, to kickoff coalescing
====================

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: xgene: fix: Coalescing values for v2 hardware

Changing the interrupt trigger region id to 2 and the
corresponding threshold set0/set1 values to 8/16.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: xgene: fix: Disable coalescing on v1 hardware

Since ethernet v1 hardware has a bug related to coalescing, disabling
this feature.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-4.9-20161031' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-10-31

this is a pull request of two patches for the upcoming v4.9 release.

The first patch is by Lukas Resch for the sja1000 plx_pci driver that adds
support for Moxa CAN devices. The second patch is by Oliver Hartkopp and fixes
a potential kernel panic in the CAN broadcast manager.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: stop clearing DMA receive control register right after it is set

Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today. I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.

On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.

This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Fixes: 25cb220d7b4b ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sctp-hold-transport-fixes'

Xin Long says:

====================
sctp: a bunch of fixes by holding transport

There are several places where it holds assoc after getting transport by
searching from transport rhashtable, it may cause use-after-free issue.

This patchset is to fix them by holding transport instead.

v1->v2:
Fix the changelog of patch 2/3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: hold transport instead of assoc when lookup assoc in rx path

Prior to this patch, in rx path, before calling lock_sock, it needed to
hold assoc when got it by __sctp_lookup_association, in case other place
would free/put assoc.

But in __sctp_lookup_association, it lookup and hold transport, then got
assoc by transport->assoc, then hold assoc and put transport. It means
it didn't hold transport, yet it was returned and later on directly
assigned to chunk->transport.

Without the protection of sock lock, the transport may be freed/put by
other places, which would cause a use-after-free issue.

This patch is to fix this issue by holding transport instead of assoc.
As holding transport can make sure to access assoc is also safe, and
actually it looks up assoc by searching transport rhashtable, to hold
transport here makes more sense.

Note that the function will be renamed later on on another patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: return back transport in __sctp_rcv_init_lookup

Prior to this patch, it used a local variable to save the transport that is
looked up by __sctp_lookup_association(), and didn't return it back. But in
sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
even if it found the transport, it was still initializing chunk->transport
with null instead.

This patch is to return the transport back through transport pointer
that is from __sctp_rcv_lookup_harder().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: hold transport instead of assoc in sctp_diag

In sctp_transport_lookup_process(), Commit f659765ec7bb ("sctp: fix
the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out
of rcu lock, but it put transport and hold assoc instead, and ignore
that cb() still uses transport. It may cause a use-after-free issue.

This patch is to hold transport instead of assoc there.

Fixes: f659765ec7bb ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netfront: do not cast grant table reference to signed short

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: bcm: fix warning in bcm_connect/proc_register

Andrey Konovalov reported an issue with proc_register in bcm.c.
As suggested by Cong Wang this patch adds a lock_sock() protection and
a check for unsuccessful proc_create_data() in bcm_connect().

Reference: http://marc.info/?l=linux-netdev&m=147732648731237

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: plx_pci: Add support for Moxa CAN devices

This patch adds support for Moxa CAN devices.

Signed-off-by: Lukas Resch <l.resch@incubedit.com>
Signed-off-by: Christoph Zehentner <c.zehentner@incubedit.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge tag 'wireless-drivers-for-davem-2016-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.9

iwlwifi

* some fixes for suspend/resume with unified FW images
* a fix for a false-positive lockdep report
* a fix for multi-queue that caused an unnecessary 1 second latency
* a fix for an ACPI parsing bug that caused a misleading error message

brcmfmac

* fix a variable uninitialised warning in brcmf_cfg80211_start_ap()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Fix incorrect reuse of MID entries

In the device, a MID entry represents a group of local ports, which can
later be bound to a MDB entry.

The lookup of an existing MID entry is currently done using the provided
MC MAC address and VID, from the Linux bridge. However, this can result
in an incorrect reuse of the same MID index in different VLAN-unaware
bridges (same IP MC group and VID 0).

Fix this by performing the lookup based on FID instead of VID, which is
unique across different bridges.

Fixes: ff14ef22bfb8 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Fix statistics' strings for Tx/Rx queues

When an interface is configured to use Tx/Rx-only queues,
the length of the statistics would be shortened to accomodate only the
statistics required per-each queue, and the values would be provided
accordingly.
However, the strings provided would still contain both Tx and Rx strings
for each one of the queues [regardless of its configuration], which might
lead to out-of-bound access when filling the buffers as well as incorrect
statistics presented.

Fixes: b174a0151266 ("qede: Add support for Tx/Rx-only queues.")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mangle zero checksum in skb_checksum_help()

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: clear sk_err_soft in sk_clone_lock()

At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.

Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dctcp: avoid bogus doubling of cwnd after loss

If a congestion control module doesn't provide .undo_cwnd function,
tcp_undo_cwnd_reduction() will set cwnd to

tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);

... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.

This can cause severe growth of cwnd (eventually overflowing u32).

Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.

Fixes: 752f76819578e2 ("net: tcp: add DCTCP congestion control algorithm")
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add mtu lock check in __ip6_rt_update_pmtu

Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
It leaded to that mtu lock doesn't really work when receiving the pkt
of ICMPV6_PKT_TOOBIG.

This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
did in __ip_rt_update_pmtu.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Don't use ufo handling on later transformed packets

Similar to commit 8d8a236f68d8 ("ipv4: Don't use ufo handling on later
transformed packets"), don't perform UFO on packets that will be IPsec
transformed. To detect it we rely on the fact that headerlen in
dst_entry is non-zero only for transformation bundles (xfrm_dst
objects).

Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
such as a dummy device:

  DEV=dum0 LEN=1493

  ip li add $DEV type dummy
  ip addr add fc00::1/64 dev $DEV nodad
  ip link set $DEV up
  ip xfrm policy add dir out src fc00::1 dst fc00::2 \
     tmpl src fc00::1 dst fc00::2 proto esp spi 1
  ip xfrm state add src fc00::1 dst fc00::2 \
     proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b

  tcpdump -n -nn -i $DEV -t &
  socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN

tcpdump output before:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|48)
  IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88

... and after:

  IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
  IP6 fc00::1 > fc00::2: frag (1448|80)

Fixes: 33655dd552ec ("[IPv4/IPv6]: UFO Scatter-gather approach")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: Fix broken RX checksums.

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions. Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

So, disable hardware RX checksum support for VER_02, and fix
an obvious coding error for IPV6 checksums in the same function.

Because this bug results in silent data corruption,
it is a good candidate for back-porting to -stable >= 3.16.xx.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...

geneve: avoid using stale geneve socket.

This patch is similar to earlier vxlan patch.
Geneve device close operation frees geneve socket. This
operation can race with geneve-xmit function which
dereferences geneve socket. Following patch uses RCU
mechanism to avoid this situation.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: avoid using stale vxlan socket.

When vxlan device is closed vxlan socket is freed. This
operation can race with vxlan-xmit function which
dereferences vxlan socket. Following patch uses RCU
mechanism to avoid this situation.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Fix out-of-bound fastpath memory access

Driver allocates a shadow array for transmitted SKBs with X entries;
That means valid indices are {0,...,X - 1}. [X == 8191]
Problem is the driver also uses X as a mask for a
producer/consumer in order to choose the right entry in the
array which allows access to entry X which is out of bounds.

To fix this, simply allocate X + 1 entries in the shadow array.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83848: add dp83822 PHY support

This PHY has a compatible register set with DP83848x so
add support for it.

Acked-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: fix rq disable

When MTU is changed from 9000 to 1500 while there is burst of inbound 9000
bytes packets, adaptor sometimes delivers 9000 bytes packets to 1500 bytes
buffers. This causes memory corruption and sometimes crash.

This is because of a race condition in adaptor between "RQ disable"
clearing descriptor mini-cache and mini-cache valid bit being set by
completion of descriptor fetch. This can result in stale RQ desc being
cached and used when packets arrive. In this case, the stale descriptor
have old MTU value.

Solution is to write RQ->disable twice. The first write will stop any
further desc fetches, allowing the second disable to clear the mini-cache
valid bit without danger of a race.

Also, the check for rq->running becoming 0 after writing rq->enable to 0
is not done properly. When incoming packets are flooding the interface,
rq->running will pulse high for each dropped packet. Since the driver was
waiting for 10us between each poll, it is possible to see rq->running = 1
1000 times in a row, even though it is not actually stuck running.
This results in false failure of vnic_rq_disable(). Fix is to try more
than 1000 time without delay between polls to ensure we do not miss when
running goes low.

In old adaptors rq->enable needs to be re-written to 0 when posted_index
is reset in vnic_rq_clean() in order to keep rq->prefetch_index in sync.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix broadcast link synchronization problem

In commit 790a53b23e88 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson <thompa.atl@gmail.com>
Tested-by: John Thompson <thompa.atl@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Fix missing brackets in init_sub_crq_irqs

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context

Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"

This reverts commit 2f4f29f45c82bafecc7df7635fe094d933822a01.

It introduced kbuild failures, new version coming.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2016-10-27

This series contains fixes to ixgbe and i40e.

Emil fixes a NULL pointer dereference when a macvlan interface is brought
up while the PF is still down.

David root caused the original panic that was fixed by commit id
(3ef0ceaf169df6 "i40e: Fix kernel panic on enable/disable LLDP") and the
fix was not quite correct, so removed the get_default_tc() and replaced
it with a #define since there is only one TC supported as a default.

Guilherme Piccoli fixes an issue where if we modprobe the driver module
without enough MSI-X interrupts, then unload the module and reload it
again, the kernel would crash. So if we fail to allocate enough MSI-X
interrupts, we should disable them since they were previously enabled.

Huaibin Wang found that the order of the arguments for
ndo_dflt_bridge_getlink() were in the correct order, so fix the order.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold

Commit 877baba "ipv4: Update parameters for csum_tcpudp_magic to their
original types" changed parameters for csum_tcpudp_magic and
csum_tcpudp_nofold for many platforms but not for PowerPC.

Fixes: 877baba "ipv4: Update parameters for csum_tcpudp_magic to their original types"
Cc: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.9-rc3

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 bugfix from Thomas Gleixner:
"A single bugfix for the recent changes related to registering the boot
  cpu when this has not happened before prefill_possible_map().

  The main problem with this change got fixed already, but we missed the
  case where the local APIC is not yet mapped, when prefill_possible_map()
  is invoked, so the registration of the boot cpu which has the APIC bit
  set in CPUID will explode.

  I should have seen that issue earlier, but all I can do now is feeling
  embarassed"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Init apic mapping before usage

Merge branch 'mlx4-fixes'

Tariq Toukan says:

====================
mlx4 misc fixes for 4.9

This patchset contains several bug fixes from the team to the
mlx4 Eth and Core drivers.

Series generated against net commit:
bfbf4dbf30fd 'sctp: fix the panic caused by route update'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Save slave ethtool stats command

Following the previous patch, as an optimization, the slave will
not even bother sending the DUMP_ETH_STATS command over the
comm channel.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix potential deadlock in port statistics flow

mlx4_en_DUMP_ETH_STATS took the *counter mutex* and then
called the FW command, with WRAPPED attribute. As a result, the fw command
is wrapped on the Hypervisor when it calls mlx4_en_DUMP_ETH_STATS.
The FW command wrapper flow on the hypervisor takes the *slave_cmd_mutex*
during processing.

At the same time, a VF could be in the process of coming up, and could
call mlx4_QUERY_FUNC_CAP. On the hypervisor, the command flow takes the
*slave_cmd_mutex*, then executes mlx4_QUERY_FUNC_CAP_wrapper.
mlx4_QUERY_FUNC_CAP wrapper calls mlx4_get_default_counter_index(),
which takes the *counter mutex*. DEADLOCK.

The fix is that the DUMP_ETH_STATS fw command should be called with
the NATIVE attribute, so that on the hypervisor, this command does not
enter the wrapper flow.

Since the Hypervisor no longer goes through the wrapper code, we also
simply return 0 in mlx4_DUMP_ETH_STATS_wrapper (i.e.the function succeeds,
but the returned data will be all zeroes).
No need to test if it is the Hypervisor going through the wrapper.

Fixes: 915d3961229c ("mlx4_core: Add "native" argument to mlx4_cmd ...")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Fix firmware command timeout during interrupt test

Currently interrupt test that is part of ethtool selftest runs the
check over all interrupt vectors of the device.
In mlx4_en package part of interrupt vectors are uninitialized since
mlx4_ib doesn't exist. This causes NOP FW command to time out.
Change logic to test current port interrupt vectors only.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Do not access comm channel if it has not yet been initialized

In the Hypervisor, there are several FW commands which are invoked
before the comm channel is initialized (in mlx4_multi_func_init).
These include MOD_STAT_CONFIG, QUERY_DEV_CAP, INIT_HCA, and others.

If any of these commands fails, say with a timeout, the Hypervisor
driver enters the internal error reset flow. In this flow, the driver
attempts to notify all slaves via the comm channel that an internal error
has occurred.

Since the comm channel has not yet been initialized (i.e., mapped via
ioremap), this will cause dereferencing a NULL pointer.

To fix this, do not access the comm channel in the internal error flow
if it has not yet been initialized.

Fixes: ff15addba7bf ("net/mlx4_core: Enable device recovery flow with SRIOV")
Fixes: bfa3f2377fe3 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix panic during reboot

Fix a kernel panic that occurs as a result of an asynchronous event
handled in roce_gid_mgmt:
mlx4_en_get_drvinfo is called and accesses freed resources.

This happens in a shutdown flow only, since pci device is destroyed
while netdevice is still alive.

Fixes: eebe24d1aff7 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Process all completions in RX rings after port goes up

Currently there is a race between incoming traffic and
initialization flow. HW is able to receive the packets
after INIT_PORT is done and unicast steering is configured.
Before we set priv->port_up NAPI is not scheduled and
receive queues become full. Therefore we never get
new interrupts about the completions.
This issue could happen if running heavy traffic during
bringing port up.
The resolution is to schedule NAPI once port_up is set.
If receive queues were full this will process all cqes
and release them.

Fixes: eebe24d1aff7 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Resolve dividing by zero in 32-bit system

When doing roundup_pow_of_two for large enough number with
bit 31, an overflow will occur and a value equal to 1 will
be returned. In this case 1 will be subtracted from the return
value and division by zero will be reached.

Fixes: 95ab31a07037 ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Change the default value of enable_qos

Change the default status of quality of service back to disabled,
as it hurts performance in some cases.

Fixes: 9c50ec4f3c70 ("net/mlx4: Set enhanced QoS support by default when ...")
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Avoid setting ports to auto when only one port type is supported

When only one port type is supported, it should be read only.
We reject changing requests, even to the auto sense mode.

Fixes: 86ba6f8f975b ("mlx4_core: Add link type autosensing")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec

The resource type enum in the resource tracker was incorrect.
RES_EQ was put in the position of RES_NPORT_ID (a FC resource).

Since the remaining resources maintain their current values,
and RES_EQ is not passed from slaves to the hypervisor in any
FW command, this change affects only the hypervisor.
Therefore, there is no backwards-compatibility issue.

Fixes: 770fe203fcd3 ("mlx4_core: initial header-file changes for SRIOV support")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs

Pull ubi/ubifs fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:

   - A regression wrt overlayfs, introduced in -rc2.
   - An UBI issue, found by Dan Carpenter's static checker"

* tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix regression in ubifs_readdir()
  ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()

rds: debug messages are enabled by default

rds use Kconfig option called "RDS_DEBUG" to enable rds debug messages.
This option cause the rds Makefile to add -DDEBUG to the rds gcc command
line.

When CONFIG_DYNAMIC_DEBUG is enabled, the "DEBUG" macro is used by
include/linux/dynamic_debug.h to decide if dynamic debug prints should
be sent by default to the kernel log.

rds should not enable this macro for production builds. rds dynamic
debug work as expected follow this fix.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2016-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two fixes:
* a fix to process all events while suspending, so any
   potential calls into the driver are done before it is
   suspended
* small markup fixes for the sphinx documentation conversion
   that's coming into the tree via the doc tree
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context

Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Fetch the phy connection type from DT

The MAC is capable of RGMII mode and that is probably a more typical
connection type than GMII today (eg it is used by Marvell Reference
designs for several SOCs). Let DT users specify the standard

phy-connection-type = "rgmii-id";

On a phy node.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"We haven't seen a whole lot of fixes for the first two weeks since the
  merge window, but here is the batch that we have at the moment.

  Nothing sticks out as particularly bad or scary, it's mostly a handful
  of smaller fixes to several platforms. The Uniphier reset controller
  changes could probably have been delayed to 4.10, but they're not
  scary and just plumbing up driver changes that went in during the
  merge window.

  We're also adding another maintainer to Marvell Berlin platforms, to
  help out when Sebastian is too busy. Yay teamwork!"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: imx: mach-imx6q: Fix the PHY ID mask for AR8031
  ARM: dts: vf610: fix IRQ flag of global timer
  ARM: imx: gpc: Fix the imx_gpc_genpd_init() error path
  ARM: imx: gpc: Initialize all power domains
  arm64: dts: Updated NAND DT properties for NS2 SVK
  arm64: dts: uniphier: change MIO node to SD control node
  ARM: dts: uniphier: change MIO node to SD control node
  reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs
  arm64: uniphier: select ARCH_HAS_RESET_CONTROLLER
  ARM: uniphier: select ARCH_HAS_RESET_CONTROLLER
  arm64: dts: Add timer erratum property for LS2080A and LS1043A
  arm64: dts: rockchip: remove the abuse of keep-power-in-suspend
  ARM: multi_v7_defconfig: Enable Intel e1000e driver
  MAINTAINERS: add myself as Marvell berlin SoC maintainer
  bus: qcom-ebi2: depend on ARCH_QCOM or COMPILE_TEST
  ARM: dts: fix the SD card on the Snowball
  arm64: dts: rockchip: remove always-on and boot-on from vcc_sd
  arm64: dts: marvell: fix clocksource for CP110 master SPI0
  ARM: mvebu: Select corediv clk for all mvebu v7 SoC

Merge tag 'batadv-net-for-davem-20161026' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are three batman-adv bugfix patches:

- Fix RCU usage for neighbor list, by Sven Eckelmann

- Fix BATADV_DBG_ALL loglevel to include TP Meter messages, by Sven Eckelmann

- Fix possible splat when disabling an interface, by Linus Luessing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "hv_netvsc: report vmbus name in ethtool"

This reverts commit b009dbbd7554
("hv_netvsc: report vmbus name in ethtool")'
because of problem introduced by commit f9a56e5d6a0ba
("Drivers: hv: make VMBus bus ids persistent").
This changed the format of the vmbus name and this new format is too
long to fit in the bus_info field of ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: on direct_xmit, limit tso and csum to supported devices

When transmitting on a packet socket with PACKET_VNET_HDR and
PACKET_QDISC_BYPASS, validate device support for features requested
in vnet_hdr.

Drop TSO packets sent to devices that do not support TSO or have the
feature disabled. Note that the latter currently do process those
packets correctly, regardless of not advertising the feature.

Because of SKB_GSO_DODGY, it is not sufficient to test device features
with netif_needs_gso. Full validate_xmit_skb is needed.

Switch to software checksum for non-TSO packets that request checksum
offload if that device feature is unsupported or disabled. Note that
similar to the TSO case, device drivers may perform checksum offload
correctly even when not advertising it.

When switching to software checksum, packets hit skb_checksum_help,
which has two BUG_ON checksum not in linear segment. Packet sockets
always allocate at least up to csum_start + csum_off + 2 as linear.

Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c

  ethtool -K eth0 tso off tx on
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
  psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N

  ethtool -K eth0 tx off
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
  psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N

v2:
  - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)

Fixes: da6f47518774 ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched actions: use nla_parse_nested()

Use nla_parse_nested instead of open-coding the call to
nla_parse() with the attribute data/len.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Fix error handling in alloc_uld_rxqs().

Fix to release resources properly in error handling path of
alloc_uld_rxqs(), This patch also removes unwanted arguments
and avoids calling the same function twice.

Fixes: 781edc2e4515 (cxgb4: Add support for dynamic allocation
of resources for ULD
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: avoid a -Wmaybe-uninitialize warning

There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:

ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()

This patch updates skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit() when an
IPv6 header is installed to a socket buffer.

This is not a cosmetic change. Without updating this value, GSO packets
transmitted through an ipip6 tunnel have the protocol of ETH_P_IP and
skb_mac_gso_segment() will attempt to call gso_segment() for IPv4,
which results in the packets being dropped.

Fixes: 686f758dbaf7 ("ip4ip6: Support for GSO/GRO")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix samples to add fake KBUILD_MODNAME

Some of the sample files are causing issues when they are loaded with tc
and cls_bpf, meaning tc bails out while trying to parse the resulting ELF
file as program/map/etc sections are not present, which can be easily
spotted with readelf(1).

Currently, BPF samples are including some of the kernel headers and mid
term we should change them to refrain from this, really. When dynamic
debugging is enabled, we bail out due to undeclared KBUILD_MODNAME, which
is easily overlooked in the build as clang spills this along with other
noisy warnings from various header includes, and llc still generates an
ELF file with mentioned characteristics. For just playing around with BPF
examples, this can be a bit of a hurdle to take.

Just add a fake KBUILD_MODNAME as a band-aid to fix the issue, same is
done in xdp*_kern samples already.

Fixes: 5fc8dfb94afc ("samples/bpf: add 'pointer to packet' tests")
Fixes: 67eca47f4039 ("samples/bpf: Add tunnel set/get tests.")
Fixes: e6c8cf7ac2fb ("cgroup: bpf: Add an example to do cgroup checking in BPF")
Reported-by: Chandrasekar Kannan <ckannan@console.to>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'char-misc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few small char/misc driver fixes for reported issues.

  The "biggest" are two binder fixes for reported issues that have been
  shipping in Android phones for a while now, the others are various
  fixes for reported problems.

  And there's a MAINTAINERS update for good measure.

  All have been in linux-next with no reported issues"

* tag 'char-misc-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  MAINTAINERS: Add entry for genwqe driver
  VMCI: Doorbell create and destroy fixes
  GenWQE: Fix bad page access during abort of resource allocation
  vme: vme_get_size potentially returning incorrect value on failure
  extcon: qcom-spmi-misc: Sync the extcon state on interrupt
  hv: do not lose pending heartbeat vmbus packets
  mei: txe: don't clean an unprocessed interrupt cause.
  ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
  ANDROID: binder: Add strong ref checks

Merge tag 'v4.9-rockchip-dts64-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes

Correct regulator handling on Rockchip arm64 boards to make
bind/unbind calls work correctly and remove a sdio-only
property from non-sdio mmc hosts, that accidentially was
added there.

* tag 'v4.9-rockchip-dts64-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: remove the abuse of keep-power-in-suspend
arm64: dts: rockchip: remove always-on and boot-on from vcc_sd

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'arm-soc/for-4.9/devicetree-arm64-fixes' of http://github.com/Broadcom/stblinux into fixes

This pull request contains a single fix for Broadcom ARM64-based SoCs:

- Ray adds the required bus width and OOB sector size properties to the
  Northstar 2 SVK reference board in order for the NAND controller to work
  properly

* tag 'arm-soc/for-4.9/devicetree-arm64-fixes' of http://github.com/Broadcom/stblinux:
  arm64: dts: Updated NAND DT properties for NS2 SVK

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'imx-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes

The i.MX fixes for 4.9:
- A couple of patches from Fabio to fix the GPC power domain regression
   which is caused by PM Domain core change 456834d9fe1370
   ("PM / Domains: Verify the PM domain is present when adding a
   provider"), and a related kernel crash seen with multi_v7_defconfig
   build.
- Correct the PHY ID mask for AR8031 to match phy driver code.
- Apply new added timer erratum A008585 for LS1043A and LS2080A SoC.
- Correct vf610 global timer IRQ flag to avoid warning from gic driver
   after commit dc756ad0ca14 ("irqchip/gic: WARN if setting the
   interrupt type for a PPI fails").

* tag 'imx-fixes-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: imx: mach-imx6q: Fix the PHY ID mask for AR8031
  ARM: dts: vf610: fix IRQ flag of global timer
  ARM: imx: gpc: Fix the imx_gpc_genpd_init() error path
  ARM: imx: gpc: Initialize all power domains
  arm64: dts: Add timer erratum property for LS2080A and LS1043A

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'uniphier-fixes-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-uniphier into fixes

UniPhier ARM SoC fixes for v4.9

- Add "select ARCH_HAS_RESET_CONTROLLER" in Kconfig
- Rename wrongly-named mioctrl to sdctrl

* tag 'uniphier-fixes-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-uniphier:
  arm64: dts: uniphier: change MIO node to SD control node
  ARM: dts: uniphier: change MIO node to SD control node
  reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs
  arm64: uniphier: select ARCH_HAS_RESET_CONTROLLER
  ARM: uniphier: select ARCH_HAS_RESET_CONTROLLER

Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are two small driver core / kernfs fixes for 4.9-rc3.

  One makes the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more
  explicit that this is a crazy thing to enable for a distro kernel
  (thanks for trying Fedora!), the other resolves an issue with vim
  opening kernfs files (sysfs, configfs, etc.)

  Both have been in linux-next with no reported issues"

* tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Make Kconfig text for DEBUG_TEST_DRIVER_REMOVE stronger
  kernfs: Add noop_fsync to supported kernfs_file_fops

Merge tag 'staging-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
"Here are some small staging and iio driver fixes for reported issues
  for 4.9-rc3. Nothing major, the "largest" being a lustre fix for a
  sysfs file that was obviously wrong, and had never been tested, so it
  was moved to debugfs as that is where it belongs. The others are small
  bug fixes for reported issues with various staging or iio drivers.

  All have been in linux-next for a while with no reported issues"

* tag 'staging-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  greybus: fix a leak on error in gb_module_create()
  greybus: es2: fix error return code in ap_probe()
  greybus: arche-platform: Add missing of_node_put() in arche_platform_change_state()
  staging: android: ion: Fix error handling in ion_query_heaps()
  iio: accel: sca3000_core: avoid potentially uninitialized variable
  iio:chemical:atlas-ph-sensor: Fix use of 32 bit int to hold 16 bit big endian value
  staging/lustre/llite: Move unstable_stats from sysfs to debugfs
  Staging: wilc1000: Fix kernel Oops on opening the device
  staging: android/ion: testing the wrong variable
  Staging: greybus: uart: Use gbphy_dev->dev instead of bundle->dev
  Staging: greybus: gpio: Use gbphy_dev->dev instead of bundle->dev
  iio: adc: ti-adc081c: Select IIO_TRIGGERED_BUFFER to prevent build errors
  iio: maxim_thermocouple: Align 16 bit big endian value of raw reads

Merge tag 'tty-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty and serial driver fixes for reported
  issues for 4.9-rc3. Nothing major, but they do resolve a bunch of
  problems with the tty core changes that are in 4.9-rc1, and finally
  the atmel serial driver is back working properly.

  All have been in linux-next with no reported issues"

* tag 'tty-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup
  tty: serial_core: Fix serial console crash on port shutdown
  tty/serial: at91: fix hardware handshake on Atmel platforms
  vt: clear selection before resizing
  sc16is7xx: always write state when configuring GPIO as an output
  sh-sci: document R8A7743/5 support
  tty: serial: 8250: 8250_core: NXP SC16C2552 workaround
  tty: limit terminal size to 4M chars
  tty: serial: fsl_lpuart: Fix Tx DMA edge case
  serial: 8250_lpss: enable MSI for sure
  serial: core: fix console problems on uart_close
  serial: 8250_uniphier: fix clearing divisor latch access bit
  serial: 8250_uniphier: fix more unterminated string
  serial: pch_uart: add terminate entry for dmi_system_id tables
  devicetree: bindings: uart: Add new compatible string for ZynqMP
  serial: xuartps: Add new compatible string for ZynqMP
  serial: SERIAL_STM32 should depend on HAS_DMA
  serial: stm32: Fix comparisons with undefined register
  tty: vt, fix bogus division in csi_J

Merge tag 'usb-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 4.9-rc3.

  There is the usual number of gadget and xhci patches in here to
  resolved reported issues, as well as some usb-serial driver fixes and
  new device ids.

  All have been in linux-next with no reported issues"

* tag 'usb-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: chipidea: host: fix NULL ptr dereference during shutdown
  usb: renesas_usbhs: add wait after initialization for R-Car Gen3
  usb: increase ohci watchdog delay to 275 msec
  usb: musb: Call pm_runtime from musb_gadget_queue
  usb: musb: Fix hardirq-safe hardirq-unsafe lock order error
  usb: ehci-platform: increase EHCI_MAX_RSTS to 4
  usb: ohci-at91: Set RemoteWakeupConnected bit explicitly.
  USB: serial: fix potential NULL-dereference at probe
  xhci: use default USB_RESUME_TIMEOUT when resuming ports.
  xhci: workaround for hosts missing CAS bit
  xhci: add restart quirk for Intel Wildcatpoint PCH
  USB: serial: cp210x: fix tiocmget error handling
  wusb: fix error return code in wusb_prf()
  Revert "Documentation: devicetree: dwc2: Deprecate g-tx-fifo-size"
  Revert "usb: dwc2: gadget: fix TX FIFO size and address initialization"
  Revert "usb: dwc2: gadget: change variable name to more meaningful"
  USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7
  wusb: Stop using the stack for sg crypto scratch space
  usb: dwc3: Fix size used in dma_free_coherent()
  usb: gadget: f_fs: stop sleeping in ffs_func_eps_disable
  ...

inet: Fix missing return value in inet6_hash

As part of a series to implement faster SO_REUSEPORT lookups,
commit ce69a4263fb8 ("sock: struct proto hash function may error")
added return values to protocol hash functions and
commit 518ae5aa0cda ("inet: create IPv6-equivalent inet_hash function")
implemented a new hash function for IPv6. However, the latter does
not respect the former's convention.

This properly propagates the hash errors in the IPv6 case.

Fixes: 518ae5aa0cda ("inet: create IPv6-equivalent inet_hash function")
Reported-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-fixes'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-10-25

This series contains some bug fixes for the mlx5 core and mlx5e driver.

From Daniel:
    - Cache line size determination at runtime, instead of using
      L1_CACHE_BYTES hard coded value, use cache_line_size()
    - Always Query HCA caps after setting them even on reset flow

From Mohamad:
    - Reorder netdev cleanup to uregister netdev before detaching it
      for the kernel to not complain about open resources such as vlans
    - Change the acl enable prototype to return status, for better error
      resiliency
    - Clear health sick bit when starting health poll after reset flow
    - Fix race between PCI error handlers and health work
    - PCI error recovery health care simulation, in case when the kernel
      PCI error handlers are not triggered for some internal firmware errors

From Noa:
    - Avoid passing dma address 0 to firmware when mapping system pages
      to the firmware

From Paul: Some straight forward flow steering fixes
    - Keep autogroups list ordered
    - Fix autogroups groups num not decreasing
    - Correctly initialize last use of flow counters

From Saeed:
    - Choose the nearest LRO timeout to the wanted one
      instead of blindly choosing "dev_cap.lro_timeout[2]"

This series has no conflict with the for-next pull request posted
earlier today ("Mellanox mlx5 core driver updates 2016-10-25").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Avoid passing dma address 0 to firmware

Currently the firmware can't work with a page with dma address 0.
Passing such an address to the firmware will cause the give_pages
command to fail.

To avoid this, in case we get a 0 dma address of a page from the
dma engine, we avoid passing it to FW by remapping to get an address
other than 0.

Fixes: 9b640ba030ac ('mlx5: Support communicating arbitrary host...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: PCI error recovery health care simulation

In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.

The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.

We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.

Fixes: c493bfa6edc2 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Fix race between PCI error handlers and health work

Currently there is a race between the health care work and the kernel
pci error handlers because both of them detect the error, the first one
to be called will do the error handling.
There is a chance that health care will disable the pci after resuming
pci slot.
Also create a separate WQ because now we will have two types of health
works, one for the error detection and one for the recovery.

Fixes: c493bfa6edc2 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Clear health sick bit when starting health poll

The health sick status should be cleared when we start the health poll.
This is crucial for driver reload (unload + load) in order to behave
right in case of health issue.

Fixes: f04c7c934374 ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Change the acl enable prototype to return status

The Ingress/Egress ACL enable function may fail and it should return
status to its caller to avoid NULL pointer dereference.

Fixes: c2349c465eab ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Unregister netdev before detaching it

Detaching the netdev before unregistering it cause some netdev cleanup
ndos to fail because they check presence of the netdev, so we need to
unregister the netdev first.

Fixes: a40d16f3cd0a ('net/mlx5e: Implement mlx5e interface attach/detach callbacks')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Choose best nearest LRO timeout

Instead of predicting the index of the wanted LRO timeout value from
hardware capabilities, look for the nearest LRO timeout value.

Fixes: c35ae32560f2 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Correctly initialize last use of flow counters

Currently, last use timestamp is initialized to zero.
This is not the expected value by higher layers such as
when we do TC action offloading. To fix that, set it to
the current time, e.g when the counter/rule is offloaded.
This is the same behaviour of non-offloaded TC actions.

Fixes: c8efbeec6437 ('mlx5_core: Flow counters infrastructure')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Fix autogroups groups num not decreasing

Autogroups groups num is increased when creating a new flow group,
but is never decreased.

Now decreasing it when deleting a flow group.

Fixes: 32bf999bc9f2 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Keep autogroups list ordered

Finding a new autogroup range is done by going over a group list
sorted by each group start index. The search is stopped after finding
the first free range. Adding the newly created group to the list is
wrongly added to the end of the list regardless of its start index as
the parameter of where to insert it is ignored.

This commit makes sure to use that unused parameter to insert
it where requested.

Fixes: 32bf999bc9f2 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Always Query HCA caps after setting them

Always query the HCA caps after setting them to update the capablities
data structures. Not doing so results in incorrect capabilities being
reported including max_dc, max_qp and several others.

Fixes: a80765cc8d4f ("net/mlx5: Split the load/unload flow into hardware
and software flows")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

{net, ib}/mlx5: Make cache line size determination at runtime.

ARM 64B cache line systems have L1_CACHE_BYTES set to 128.
cache_line_size() will return the correct size.

Fixes: cf50b5efa2fe('net/mlx5_core/ib: New device capabilities
handling.')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: validate chunk len before actually using it

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86/smpboot: Init apic mapping before usage

The recent changes, which forced the registration of the boot cpu on UP
systems, which do not have ACPI tables, have been fixed for systems w/o
local APIC, but left a wreckage for systems which have neither ACPI nor
mptables, but the CPU has an APIC, e.g. virtualbox.

The boot process crashes in prefill_possible_map() as it wants to register
the boot cpu, which needs to access the local apic, but the local APIC is
not yet mapped.

There is no reason why init_apic_mapping() can't be invoked before
prefill_possible_map(). So instead of playing another silly early mapping
game, as the ACPI/mptables code does, we just move init_apic_mapping()
before the call to prefill_possible_map().

In hindsight, I should have noticed that combination earlier.

Sorry for the churn (also in stable)!

Fixes: 4e85d875e9f4 ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com>
Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at>
Cc: prarit@redhat.com
Cc: ville.syrjala@linux.intel.com
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: frank.mehnert@oracle.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix recent ACPICA regressions, an older PCI IRQ management
  regression, and an incorrect return value of a function in the APEI
  code.

  Specifics:

   - Fix three ACPICA issues related to the interpreter locking and
     introduced by recent changes in that area (Lv Zheng).

   - Fix a PCI IRQ management regression introduced during the 4.7 cycle
     and related to the configuration of shared IRQs on systems with an
     ISA bus (Sinan Kaya).

   - Fix up a return value of one function in the APEI code (Punit
     Agrawal)"

* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
  ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
  ACPICA: Dispatcher: Fix order issue of method termination
  ACPI / APEI: Fix incorrect return value of ghes_proc()
  ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
  ACPI/PCI: pci_link: penalize SCI correctly
  ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages

Merge tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix two intel_pstate issues related to the way it works when the
  scaling_governor sysfs attribute is set to "performance" and fix up
  messages in the system suspend core code.

  Specifics:

   - Fix a missing KERN_CONT in a system suspend message by converting
     the affected code to using pr_info() and pr_cont() instead of the
     "raw" printk() (Jon Hunter).

   - Make intel_pstate set the CPU P-state from its .set_policy()
     callback when the scaling_governor sysfs attribute is set to
     "performance" so that it interacts with NOHZ_FULL more predictably
     which was the case before 4.7 (Rafael Wysocki).

   - Make intel_pstate always request the maximum allowed P-state when
     the scaling_governor sysfs attribute is set to "performance" to
     prevent it from effectively ingoring that setting is some
     situations (Rafael Wysocki)"

* tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Always set max P-state in performance mode
  PM / suspend: Fix missing KERN_CONT for suspend message
  cpufreq: intel_pstate: Set P-state upfront in performance mode