vhost-net: set packet weight of tx polling to 2 * vq size
handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
polling udp packets with small length(e.g. 1byte udp payload), because setting
VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
Ping-Latencies shown below were tested between two Virtual Machines using
netperf (UDP_STREAM, len=1), and then another machine pinged the client:
Seems pretty consistent, a small dip at 2 VQ sizes.
Ring size is a hint from device about a burst size it can tolerate. Based on
benchmarks, set the weight to 2 * vq size.
To evaluate this change, another tests were done using netperf(RR, TX) between
two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
tweaked through qemu. Results shown below does not show obvious changes.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com> Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: thunderx: rework mac addresses list to u64 array
It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.
This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.
Addresses-Coverity-ID: 1467429 ("Dereference null return value") Fixes: 4272642780a9 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is a false positive, because the inetpeer wont be erased
from rb-tree if the refcount_dec_if_one(&p->refcnt) does not
succeed. And this wont happen before first inet_putpeer() call
for this inetpeer has been done, and ->dtime field is written
exactly before the refcount_dec_and_test(&p->refcnt).
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dp83640: Ensure against premature access to PHY registers after reset
The datasheet specifies a 3uS pause after performing a software
reset. The default implementation of genphy_soft_reset() does not
provide this, so implement soft_reset with the needed pause.
Signed-off-by: Esben Haabendal <eha@deif.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Two sockmap fixes: i) fix a potential warning when a socket with
pending cork data is closed by freeing the memory right when the
socket is closed instead of seeing still outstanding memory at
garbage collector time, ii) fix a NULL pointer deref in case of
duplicates release calls, so make sure to only reset the sk_prot
pointer when it's in a valid state to do so, both from John.
2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
by moving the function under CONFIG_CGROUP_BPF ifdef since only
used there, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
devlink: convert occ_get op to separate registration
This resolves race during initialization where the resources with
ops are registered before driver and the structures used by occ_get
op is initialized. So keep occ_get callbacks registered only when
all structs are initialized.
The example flows, as it is in mlxsw:
1) driver load/asic probe:
mlxsw_core
-> mlxsw_sp_resources_register
-> mlxsw_sp_kvdl_resources_register
-> devlink_resource_register IDX
mlxsw_spectrum
-> mlxsw_sp_kvdl_init
-> mlxsw_sp_kvdl_parts_init
-> mlxsw_sp_kvdl_part_init
-> devlink_resource_size_get IDX (to get the current setup
size from devlink)
-> devlink_resource_occ_get_register IDX (register current
occupancy getter)
2) reload triggered by devlink command:
-> mlxsw_devlink_core_bus_device_reload
-> mlxsw_sp_fini
-> mlxsw_sp_kvdl_fini
-> devlink_resource_occ_get_unregister IDX
(struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
which is using mlxsw_sp would cause use-after free)
-> mlxsw_sp_init
-> mlxsw_sp_kvdl_init
-> mlxsw_sp_kvdl_parts_init
-> mlxsw_sp_kvdl_part_init
-> devlink_resource_size_get IDX (to get the current setup
size from devlink)
-> devlink_resource_occ_get_register IDX (register current
occupancy getter)
Fixes: 6de8841aed26 ("devlink: Add support for resource abstraction") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current (mildly evil) fsl_pq_mdio code uses an undocumented shadow of
the TBIPA register on LS1021A, which happens to be read-only.
Changing TBI PHY address therefore does not work on LS1021A.
The real (and documented) address of the TBIPA registere lies in the eTSEC
block and not in MDIO/MII, which is read/write, so using that fixes
the problem.
Signed-off-by: Esben Haabendal <eha@deif.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
This introduces a simpler and generic method for for finding (and mapping)
the TBIPA register.
Instead of relying of complicated logic for finding the TBIPA register
address based on the MDIO or MII register block base
address, which even in some cases relies on undocumented shadow registers,
a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
register.
Backwards compatibility is kept, as the existing logic is applied when
only a single "reg" mapping is specified.
Signed-off-by: Esben Haabendal <eha@deif.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ibmvnic: Fix driver reset and DMA bugs
This patch series introduces some fixes to the driver reset
routines and a patch that fixes mistakes caught by the kernel
DMA debugger.
The reset fixes include a fix to reset TX queue counters properly
after a reset as well as updates to driver reset error-handling code.
It also provides updates to the reset handling routine for redundant
backing VF failover and partition migration cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic: Do not reset CRQ for Mobility driver resets
When resetting the ibmvnic driver after a partition migration occurs
there is no requirement to do a reset of the main CRQ. The current
driver code does the required re-enable of the main CRQ, then does
a reset of the main CRQ later.
What we should be doing for a driver reset after a migration is to
re-enable the main CRQ, release all the sub-CRQs, and then allocate
new sub-CRQs after capability negotiation.
This patch updates the handling of mobility resets to do the proper
work and not reset the main CRQ. To do this the initialization/reset
of the main CRQ had to be moved out of the ibmvnic_init routine
and in to the ibmvnic_probe and do_reset routines.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 6 Apr 2018 23:37:05 +0000 (18:37 -0500)]
ibmvnic: Fix failover case for non-redundant configuration
There is a failover case for a non-redundant pseries VNIC
configuration that was not being handled properly. The current
implementation assumes that the driver will always have a redandant
device to communicate with following a failover notification. There
are cases, however, when a non-redundant configuration can receive
a failover request. If that happens, the driver should wait until
it receives a signal that the device is ready for operation.
The driver is agnostic of its backing hardware configuration,
so this fix necessarily affects all device failover management.
The driver needs to wait until it receives a signal that the device
is ready for resetting. A flag is introduced to track this intermediary
state where the driver is waiting for an active device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 6 Apr 2018 23:37:04 +0000 (18:37 -0500)]
ibmvnic: Fix reset scheduler error handling
In some cases, if the driver is waiting for a reset following
a device parameter change, failure to schedule a reset can result
in a hang since a completion signal is never sent.
If the device configuration is being altered by a tool such
as ethtool or ifconfig, it could cause the console to hang
if the reset request does not get scheduled. Add some additional
error handling code to exit the wait_for_completion if there is
one in progress.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 6 Apr 2018 23:37:03 +0000 (18:37 -0500)]
ibmvnic: Zero used TX descriptor counter on reset
The counter that tracks used TX descriptors pending completion
needs to be zeroed as part of a device reset. This change fixes
a bug causing transmit queues to be stopped unnecessarily and in
some cases a transmit queue stall and timeout reset. If the counter
is not reset, the remaining descriptors will not be "removed",
effectively reducing queue capacity. If the queue is over half full,
it will cause the queue to stall if stopped.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 6 Apr 2018 23:37:02 +0000 (18:37 -0500)]
ibmvnic: Fix DMA mapping mistakes
Fix some mistakes caught by the DMA debugger. The first change
fixes a unnecessary unmap that should have been removed in an
earlier update. The next hunk fixes another bad unmap by zeroing
the bit checked to determine that an unmap is needed. The final
change fixes some buffers that are unmapped with the wrong
direction specified.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sat, 7 Apr 2018 01:54:52 +0000 (18:54 -0700)]
tipc: use the right skb in tipc_sk_fill_sock_diag()
Commit a9c7d39c5ebd ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
tried to fix the crash but failed, the crash is still 100% reproducible
with it.
In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
we should use NETLINK_CB(cb->skb).sk here.
Reported-by: <syzbot+326e587eff1074657718@syzkaller.appspotmail.com> Fixes: a9c7d39c5ebd ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag") Fixes: e9a4ef2a0a6e (tipc: implement socket diagnostics for AF_TIPC) Cc: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable description: ----address@SYSC_bind
Variable was created at:
SYSC_bind+0x6f/0x4b0 net/socket.c:1461
SyS_bind+0x54/0x80 net/socket.c:1460
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 7 Apr 2018 18:37:40 +0000 (20:37 +0200)]
net: dsa: Discard frames from unused ports
The Marvell switches under some conditions will pass a frame to the
host with the port being the CPU port. Such frames are invalid, and
should be dropped. Not dropping them can result in a crash when
incrementing the receive statistics for an invalid port.
Reported-by: Chris Healy <cphealy@gmail.com> Fixes: 676811fd1972 ("net: Distributed Switch Architecture protocol support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable description: ----addr@___sys_recvmsg
Variable was created at:
___sys_recvmsg+0xd5/0x810 net/socket.c:2172
__sys_recvmmsg+0x54e/0xdb0 net/socket.c:2313
Bytes 8-15 of 16 are uninitialized
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 1f87a46ec64e ("soreuseport: TCP/IPv4 implementation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable description: ----res.i.i@ip_route_output_flow
Variable was created at:
ip_route_output_flow+0x75/0x3c0 net/ipv4/route.c:2576
raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 41df2a575a7a ("net: introduce a list of device addresses dev_addr_list (v6)") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 7 Apr 2018 20:42:39 +0000 (13:42 -0700)]
net: initialize skb->peeked when cloning
syzbot reported __skb_try_recv_from_queue() was using skb->peeked
while it was potentially unitialized.
We need to clear it in __skb_clone()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 7 Apr 2018 20:42:38 +0000 (13:42 -0700)]
net: fix rtnh_ok()
syzbot reported :
BUG: KMSAN: uninit-value in rtnh_ok include/net/nexthop.h:11 [inline]
BUG: KMSAN: uninit-value in fib_count_nexthops net/ipv4/fib_semantics.c:469 [inline]
BUG: KMSAN: uninit-value in fib_create_info+0x554/0x8d20 net/ipv4/fib_semantics.c:1091
@remaining is an integer, coming from user space.
If it is negative we want rtnh_ok() to return false.
Fixes: be8a7cf82fb5 ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 7 Apr 2018 20:42:37 +0000 (13:42 -0700)]
netlink: fix uninit-value in netlink_sendmsg
syzbot reported :
BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline]
BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 7 Apr 2018 20:42:36 +0000 (13:42 -0700)]
crypto: af_alg - fix possible uninit-value in alg_bind()
syzbot reported :
BUG: KMSAN: uninit-value in alg_bind+0xe3/0xd90 crypto/af_alg.c:162
We need to check addr_len before dereferencing sa (or uaddr)
Fixes: f27b3a63712a ("crypto: af_alg - whitelist mask and type") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Stephan Mueller <smueller@chronox.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sat, 7 Apr 2018 00:19:41 +0000 (17:19 -0700)]
net_sched: fix a missing idr_remove() in u32_delete_key()
When we delete a u32 key via u32_delete_key(), we forget to
call idr_remove() to remove its handle from IDR.
Fixes: 207768d9ea13 ("net_sched: use idr to allocate u32 filter handles") Reported-by: Marcin Kabiesz <admin@hostcenter.eu> Tested-by: Marcin Kabiesz <admin@hostcenter.eu> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell: Enable interrupt function on LED2 pin
The LED2[2]/INTn pin on Marvell 88E1318S as well as 88E1510/12/14/18 needs
to be configured to be usable as interrupt not only when WOL is enabled,
but whenever we rely on interrupts from the PHY.
Signed-off-by: Esben Haabendal <eha@deif.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 28 Mar 2018 12:50:45 +0000 (12:50 +0000)]
ice: Fix error return code in ice_init_hw()
Fix to return error code ICE_ERR_NO_MEMORY from the alloc error
handling case instead of 0, as done elsewhere in this function.
Fixes: 7823dcda65c7 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
net/sched: fix NULL dereference in the error path of tcf_bpf_init()
when tcf_bpf_init_from_ops() fails (e.g. because of program having invalid
number of instructions), tcf_bpf_cfg_cleanup() calls bpf_prog_put(NULL) or
bpf_prog_destroy(NULL). Unless CONFIG_BPF_SYSCALL is unset, this causes
the following error:
Fix it in tcf_bpf_cfg_cleanup(), ensuring that bpf_prog_{put,destroy}(f)
is called only when f is not NULL.
Fixes: fe2fe6eed0ab ("net/sched: fix idr leak on the error path of tcf_bpf_init()") Reported-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Barnhill [Thu, 5 Apr 2018 21:29:47 +0000 (21:29 +0000)]
net/ipv6: Increment OUTxxx counters after netfilter hook
At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and
IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call
for NFPROTO_IPV6 / NF_INET_FORWARD. As a result, these counters get
incremented regardless of whether or not the netfilter hook allows the
packet to continue being processed. This change increments the counters
in ip6_forward_finish() so that it will not happen if the netfilter hook
chooses to terminate the packet, which is similar to how IPv4 works.
Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
hv_netvsc: Fix shutdown issues on older Windows hosts
Guests running on WS2012 hosts would not shutdown when changing network
interface setting (e.g. Number of channels, MTU ... etc).
This patch series addresses these shutdown issues we enecountered with WS2012
hosts. It's essentialy a rework of the series sent in
https://lkml.org/lkml/2018/1/23/111 on top of latest upstream
====================
Fixes: b3ad3b718aa0 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammed Gamal [Thu, 5 Apr 2018 19:09:21 +0000 (21:09 +0200)]
hv_netvsc: Pass net_device parameter to revoke and teardown functions
The callers to netvsc_revoke_*_buf() and netvsc_teardown_*_gpadl()
already have their net_device instances. Pass them as a paramaeter to
the function instead of obtaining them from netvsc_device struct
everytime
Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammed Gamal [Thu, 5 Apr 2018 19:09:20 +0000 (21:09 +0200)]
hv_netvsc: Ensure correct teardown message sequence order
Prior to commit 30a734dfed91 ("hv_netvsc: netvsc_teardown_gpadl() split")
the call sequence in netvsc_device_remove() was as follows (as
implemented in netvsc_destroy_buf()):
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Teardown receive buffer GPADL
3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
4- Teardown send buffer GPADL
5- Close vmbus
This didn't work for WS2016 hosts. Commit 30a734dfed91
("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the
teardown sequence as follows:
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Close vmbus
4- Teardown receive buffer GPADL
5- Teardown send buffer GPADL
That worked well for WS2016 hosts, but it prevented guests on older hosts from
shutting down after changing network settings. Commit b3ad3b718aa0
("hv_netvsc: change GPAD teardown order on older versions") ensured the
following message sequence for older hosts
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Teardown receive buffer GPADL
4- Teardown send buffer GPADL
5- Close vmbus
However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the
process becomes uninterruptible. On futher analysis it turns out that on tearing
down the receive buffer GPADL the kernel is waiting indefinitely
in vmbus_teardown_gpadl() for a completion to be signaled.
Here is a snippet of where this occurs:
int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
{
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
int ret;
info = kmalloc(sizeof(*info) +
sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
if (!info)
return -ENOMEM;
The completion is signaled from vmbus_ongpadl_torndown(), which gets called when
the corresponding message is received from the host, which apparently never happens
in that case.
This patch works around the issue by restoring the first mentioned message sequence
for older hosts
Fixes: b3ad3b718aa0 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammed Gamal [Thu, 5 Apr 2018 19:09:19 +0000 (21:09 +0200)]
hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
Split each of the functions into two for each of send/recv buffers.
This will be needed in order to implement a fine-grained messaging
sequence to the host so that we accommodate the requirements of
different Windows versions
Fixes: b3ad3b718aa06 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammed Gamal [Thu, 5 Apr 2018 19:09:18 +0000 (21:09 +0200)]
hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown
When changing network interface settings, Windows guests
older than WS2016 can no longer shutdown. This was addressed
by commit b3ad3b718aa06 ("hv_netvsc: change GPAD teardown order
on older versions"), however the issue also occurs on WS2012
guests that share NVSP protocol versions with WS2016 guests.
Hence we use Windows version directly to differentiate them.
Fixes: b3ad3b718aa06 ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Boundary check in mvpp2_prs_init_from_hw must be done according to the
passed "tid" parameter, not the mvpp2_prs_entry index, which is not yet
initialized at the time of the check.
Fixes: ee29f7301cc3 ("net: mvpp2: Make mvpp2_prs_hw_read a parser entry init function") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arp_filter performs an ip_route_output search for arp source address and
checks if output device is the same where the arp request was received,
if it is not, the arp request is not answered.
This route lookup is always done on main route table so l3slave devices
never find the proper route and arp is not answered.
Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the
lookup for l3slave devices while maintaining same behavior for non
l3slave devices as this function returns 0 in that case.
Fixes: b8ceeef2ca2c ("net: Use VRF device index for lookups on TX") Signed-off-by: Miguel Fadon Perlines <mfadon@teldat.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Apr 2018 19:16:15 +0000 (15:16 -0400)]
Merge branch 'net-tunnel-name-validate'
Eric Dumazet says:
====================
net: better validate user provided tunnel names
This series changes dev_valid_name() to not attempt reading
a possibly too long user-provided device name, then use
this helper in five different tunnel providers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Apr 2018 13:39:31 +0000 (06:39 -0700)]
vti6: better validate user provided tunnel names
Use valid_name() to make sure user does not provide illegal
device name.
Fixes: 362a48aabff4 ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Apr 2018 13:39:29 +0000 (06:39 -0700)]
ip6_gre: better validate user provided tunnel names
Use dev_valid_name() to make sure user does not provide illegal
device name.
syzbot caught the following bug :
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339
Write of size 20 at addr ffff8801afb9f7b8 by task syzkaller851048/4466
Fixes: 049b2892814b ("gre: Support GRE over IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Apr 2018 13:39:28 +0000 (06:39 -0700)]
ipv6: sit: better validate user provided tunnel names
Use dev_valid_name() to make sure user does not provide illegal
device name.
syzbot caught the following bug :
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254
Write of size 33 at addr ffff8801b64076d8 by task syzkaller932654/4453
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Apr 2018 13:39:27 +0000 (06:39 -0700)]
ip_tunnel: better validate user provided tunnel names
Use dev_valid_name() to make sure user does not provide illegal
device name.
syzbot caught the following bug :
BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482
Fixes: e7ccb3806ba4 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
"Here is the big set of char/misc driver patches for 4.17-rc1.
There are a lot of little things in here, nothing huge, but all
important to the different hardware types involved:
- thunderbolt driver updates
- parport updates (people still care...)
- nvmem driver updates
- mei updates (as always)
- hwtracing driver updates
- hyperv driver updates
- extcon driver updates
- ... and a handful of even smaller driver subsystem and individual
driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
hwtracing: Add HW tracing support menu
intel_th: Add ACPI glue layer
intel_th: Allow forcing host mode through drvdata
intel_th: Pick up irq number from resources
intel_th: Don't touch switch routing in host mode
intel_th: Use correct method of finding hub
intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
stm class: Make dummy's master/channel ranges configurable
stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
hv: add SPDX license id to Kconfig
hv: add SPDX license to trace
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
/dev/mem: Avoid overwriting "err" in read_mem()
eeprom: at24: use SPDX identifier instead of GPL boiler-plate
eeprom: at24: simplify the i2c functionality checking
eeprom: at24: fix a line break
eeprom: at24: tweak newlines
eeprom: at24: refactor at24_probe()
...
Merge tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core patches for 4.17-rc1.
There's really not much here, just a bunch of firmware code
refactoring from Luis as he attempts to wrangle that codebase into
something that is managable, along with a bunch of userspace tests for
it. Other than that, a handful of small bugfixes and reverts of things
that didn't work out.
Full details are in the shortlog, it's not all that much.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
drivers: base: remove check for callback in coredump_store()
mt7601u: use firmware_request_cache() to address cache on reboot
firmware: add firmware_request_cache() to help with cache on reboot
firmware: fix typo on pr_info_once() when ignore_sysfs_fallback is used
firmware: explicitly include vmalloc.h
firmware: ensure the firmware cache is not used on incompatible calls
test_firmware: modify custom fallback tests to use unique files
firmware: add helper to check to see if fw cache is setup
firmware: fix checking for return values for fw_add_devm_name()
rename: _request_firmware_load() fw_load_sysfs_fallback()
test_firmware: test three firmware kernel configs using a proc knob
test_firmware: expand on library with shared helpers
firmware: enable to force disable the fallback mechanism at run time
firmware: enable run time change of forcing fallback loader
firmware: move firmware loader into its own directory
firmware: split firmware fallback functionality into its own file
firmware: move loading timeout under struct firmware_fallback_config
firmware: use helpers for setting up a temporary cache timeout
firmware: simplify CONFIG_FW_LOADER_USER_HELPER_FALLBACK further
drivers: base: add description for .coredump() callback
...
Merge tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big set of Staging/IIO driver patches for 4.17-rc1.
It is a lot, over 500 changes, but not huge by previous kernel release
standards. We deleted more lines than we added again (27k added vs.
91k remvoed), thanks to finally being able to delete the IRDA drivers
and networking code.
We also deleted the ccree crypto driver, but that's coming back in
through the crypto tree to you, in a much cleaned-up form.
Added this round is at lot of "mt7621" device support, which is for an
embedded device that Neil Brown cares about, and of course a handful
of new IIO drivers as well.
And finally, the fsl-mc core code moved out of the staging tree to the
"real" part of the kernel, which is nice to see happen as well.
Full details are in the shortlog, which has all of the tiny cleanup
patches described.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (579 commits)
staging: rtl8723bs: Remove yield call, replace with cond_resched()
staging: rtl8723bs: Replace yield() call with cond_resched()
staging: rtl8723bs: Remove unecessary newlines from 'odm.h'.
staging: rtl8723bs: Rework 'struct _ODM_Phy_Status_Info_' coding style.
staging: rtl8723bs: Rework 'struct _ODM_Per_Pkt_Info_' coding style.
staging: rtl8723bs: Replace NULL pointer comparison with '!'.
staging: rtl8723bs: Factor out rtl8723bs_recv_tasklet() sections.
staging: rtl8723bs: Fix function signature that goes over 80 characters.
staging: rtl8723bs: Fix lines too long in update_recvframe_attrib().
staging: rtl8723bs: Remove unnecessary blank lines in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Change camel case to snake case in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Add missing braces in else statement.
staging: rtl8723bs: Add spaces around ternary operators.
staging: rtl8723bs: Fix lines with trailing open parentheses.
staging: rtl8723bs: Remove unnecessary length #define's.
staging: rtl8723bs: Fix IEEE80211 authentication algorithm constants.
staging: rtl8723bs: Fix alignment in rtw_wx_set_auth().
staging: rtl8723bs: Remove braces from single statement conditionals.
staging: rtl8723bs: Remove unecessary braces from switch statement.
staging: rtl8723bs: Fix newlines in rtw_wx_set_auth().
...
Merge tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver patches for 4.17-rc1
Not all that big really, most are just small fixes and additions to
existing drivers. There's a bunch of work on the imx serial driver
recently for some reason, and a new embedded serial driver added as
well.
Full details are in the shortlog.
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
serial: expose buf_overrun count through proc interface
serial: mvebu-uart: fix tx lost characters
tty: serial: msm_geni_serial: Fix return value check in qcom_geni_serial_probe()
tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
8250-men-mcb: add support for 16z025 and 16z057
powerpc: Mark the variable earlycon_acpi_spcr_enable maybe_unused
serial: stm32: fix initialization of RS485 mode
ARM: dts: STi: Remove "console=ttyASN" from bootargs for STi boards
vt: change SGR 21 to follow the standards
serdev: Fix typo in serdev_device_alloc
ARM: dts: STi: Fix aliases property name for STi boards
tty: st-asc: Update tty alias
serial: stm32: add support for RS485 hardware control mode
dt-bindings: serial: stm32: add RS485 optional properties
selftests: add devpts selftests
devpts: comment devpts_mntget()
devpts: resolve devpts bind-mounts
devpts: hoist out check for DEVPTS_SUPER_MAGIC
serial: 8250: Add Nuvoton NPCM UART
serial: mxs-auart: disable clks of Alphascale ASM9260
...
Merge tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/PHY updates from Greg KH:
"Here is the big set of USB and PHY driver patches for 4.17-rc1.
Lots of USB typeC work happened this round, with code moving from the
staging directory into the "real" part of the kernel, as well as new
infrastructure being added to be able to handle the different types of
"roles" that typeC requires.
There is also the normal huge set of USB gadget controller and driver
updates, along with XHCI changes, and a raft of other tiny fixes all
over the USB tree. And the PHY driver updates are merged in here as
well as they interacted with the USB drivers in some places.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (250 commits)
Revert "USB: serial: ftdi_sio: add Id for Physik Instrumente E-870"
usb: musb: gadget: misplaced out of bounds check
usb: chipidea: imx: Fix ULPI on imx53
usb: chipidea: imx: Cleanup ci_hdrc_imx_platform_flag
usb: chipidea: usbmisc: small clean up
usb: chipidea: usbmisc: evdo can be set e/o reset
usb: chipidea: usbmisc: evdo is only specific to OTG port
USB: serial: ftdi_sio: add Id for Physik Instrumente E-870
usb: dwc3: gadget: never call ->complete() from ->ep_queue()
usb: gadget: udc: core: update usb_ep_queue() documentation
usb: host: Remove the deprecated ATH79 USB host config options
usb: roles: Fix return value check in intel_xhci_usb_probe()
USB: gadget: f_midi: fixing a possible double-free in f_midi
usb: core: Add USB_QUIRK_DELAY_CTRL_MSG to usbcore quirks
usb: core: Copy parameter string correctly and remove superfluous null check
USB: announce bcdDevice as well as idVendor, idProduct.
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: hub: Reduce warning to notice on power loss
USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator
USB: serial: cp210x: add ELDAT Easywave RX09 id
...
Pull networking fixes from David Miller:
"This fixes some fallout from the net-next merge the other day, plus
some non-merge-window-related bug fixes:
1) Fix sparse warnings in bcmgenet, systemport, b53, and mt7530
(Florian Fainelli)
2) pptp does a bogus dst_release() on a route we have a single
refcount on, and attached to a socket, which needs that refcount
(Eric Dumazet)
3) UDP connected sockets on ipv6 can race with route update handling,
resulting in a pre-PMTU update route still stuck on the socket and
thus continuing to get ICMPV6_PKT_TOOBIG errors. We end up never
seeing the updated route. (Alexey Kodanev)
4) Missing list initializer(s) in TIPC (Jon Maloy)
5) Connect phy early to prevent crashes in lan78xx driver (Alexander
Graf)
6) Fix build with modular NVMEM (Arnd Bergmann)
7) netdevsim canot mark nsim_devlink_net_ops and nsim_fib_net_ops as
__net_initdata, as these are references from module unload
unconditionally (Arnd Bergmann)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
netdevsim: remove incorrect __net_initdata annotations
sfc: remove ctpio_dmabuf_start from stats
inet: frags: fix ip6frag_low_thresh boundary
tipc: Fix namespace violation in tipc_sk_fill_sock_diag
net: avoid unneeded atomic operation in ip*_append_data()
nvmem: disallow modular CONFIG_NVMEM
net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
nfp: use full 40 bits of the NSP buffer address
lan78xx: Connect phy early
nfp: add a separate counter for packets with CHECKSUM_COMPLETE
tipc: Fix missing list initializations in struct tipc_subscription
ipv6: udp: set dst cache for a connected sk if current not valid
ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()
ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
ipv6: add a wrapper for ip6_dst_store() with flowi6 checks
net: phy: marvell10g: add thermal hwmon device
pptp: remove a buggy dst release in pptp_connect()
net: dsa: mt7530: Use NULL instead of plain integer
net: dsa: b53: Fix sparse warnings in b53_mmap.c
af_unix: remove redundant lockdep class
...
Merge tag 'riscv-for-linus-4.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains the new features we'd like to incorporate into the
RISC-V port for 4.17. We might have a bit more stuff land later in the
merge window, but I wanted to get this out earlier just so everyone
can see where we currently stand.
A short summary of the changes is:
- We've added support for dynamic ftrace on RISC-V targets.
- There have been a handful of cleanups to our atomic and locking
routines. They now more closely match the released RISC-V memory
model draft.
- Our module loading support has been cleaned up and is now enabled
by default, despite some limitations still existing.
- A patch to define COMMANDLINE_FORCE instead of COMMANDLINE_OVERRIDE
so the generic device tree code picks up handling all our command
line stuff.
There's more information in the merge commits for each patch set"
* tag 'riscv-for-linus-4.17-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (21 commits)
RISC-V: Rename CONFIG_CMDLINE_OVERRIDE to CONFIG_CMDLINE_FORCE
RISC-V: Add definition of relocation types
RISC-V: Enable module support in defconfig
RISC-V: Support SUB32 relocation type in kernel module
RISC-V: Support ADD32 relocation type in kernel module
RISC-V: Support ALIGN relocation type in kernel module
RISC-V: Support RVC_BRANCH/JUMP relocation type in kernel modulewq
RISC-V: Support HI20/LO12_I/LO12_S relocation type in kernel module
RISC-V: Support CALL relocation type in kernel module
RISC-V: Support GOT_HI20/CALL_PLT relocation type in kernel module
RISC-V: Add section of GOT.PLT for kernel module
RISC-V: Add sections of PLT and GOT for kernel module
riscv/atomic: Strengthen implementations with fences
riscv/spinlock: Strengthen implementations with fences
riscv/barrier: Define __smp_{store_release,load_acquire}
riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support
riscv/ftrace: Add DYNAMIC_FTRACE_WITH_REGS support
riscv/ftrace: Add ARCH_SUPPORTS_FTRACE_OPS support
riscv/ftrace: Add dynamic function graph tracer support
riscv/ftrace: Add dynamic function tracer support
...
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual pile of boring changes:
- Consolidate tasklet functions to share code instead of duplicating
it
- The first step for making the low level entry handler management on
multi-platform kernels generic
- A new sysfs file which allows to retrieve the wakeup state of
interrupts.
- Ensure that the interrupt thread follows the effective affinity and
not the programmed affinity to avoid cross core wakeups.
- Two new interrupt controller drivers (Microsemi Ocelot and Qualcomm
PDC)
- Fix the wakeup path clock handling for Reneasas interrupt chips.
- Rework the boot time register reset for ARM GIC-V2/3
- Better suspend/resume support for ARM GIV-V3/ITS
- Add missing locking to the ARM GIC set_type() callback
- Small fixes for the irq simulator code
- SPDX identifiers for the irq core code and removal of boiler plate
- Small cleanups all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
openrisc: Set CONFIG_MULTI_IRQ_HANDLER
arm64: Set CONFIG_MULTI_IRQ_HANDLER
genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
irqchip/gic: Take lock when updating irq type
irqchip/gic: Update supports_deactivate static key to modern api
irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling
irqchip: Add a driver for the Microsemi Ocelot controller
dt-bindings: interrupt-controller: Add binding for the Microsemi Ocelot interrupt controller
irqchip/gic-v3: Probe for SCR_EL3 being clear before resetting AP0Rn
irqchip/gic-v3: Don't try to reset AP0Rn
irqchip/gic-v3: Do not check trigger configuration of partitionned LPIs
genirq: Remove license boilerplate/references
genirq: Add missing SPDX identifiers
genirq/matrix: Cleanup SPDX identifier
genirq: Cleanup top of file comments
genirq: Pass desc to __irq_free instead of irq number
irqchip/gic-v3: Loudly complain about the use of IRQ_TYPE_NONE
irqchip/gic: Loudly complain about the use of IRQ_TYPE_NONE
RISC-V: Move to the new GENERIC_IRQ_MULTI_HANDLER handler
genirq: Add CONFIG_GENERIC_IRQ_MULTI_HANDLER
...
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time(r) updates from Thomas Gleixner:
"A small set of updates for timers and timekeeping:
- The most interesting change is the consolidation of clock MONOTONIC
and clock BOOTTIME.
Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
not longer ignore the time spent in suspend. A new clock
MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
kernels before this change. This allows applications to
programmatically check for the clock MONOTONIC behaviour.
As discussed in the review thread, this has the potential of
breaking user space and we might have to revert this. Knock on wood
that we can avoid that exercise.
- Updates to the NTP mechanism to improve accuracy
- A new kernel internal data structure to aid the ongoing Y2038 work.
- Cleanups and simplifications of the clocksource code.
- Make the alarmtimer code play nicely with debugobjects"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Init nanosleep alarm timer on stack
y2038: Introduce struct __kernel_old_timeval
tracing: Unify the "boot" and "mono" tracing clocks
hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Remove boot time specific code
Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
timekeeping/ntp: Determine the multiplier directly from NTP tick length
timekeeping/ntp: Don't align NTP frequency adjustments to ticks
clocksource: Use ATTRIBUTE_GROUPS
clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
clocksource: Don't walk the clocksource list for empty override
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random updates from Ted Ts'o:
"A few random (cough, cough) cleanups for the /dev/random driver"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
drivers/char/random.c: remove unused dont_count_entropy
random: optimize add_interrupt_randomness
random: always fill buffer in get_random_bytes_wait
random: use a tighter cap in credit_entropy_bits_safe()
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Cleanups and bugfixes for ext4, including some fixes to make ext4 more
robust against maliciously crafted file system images.
(I still don't recommend that container folks hold any delusions that
mounting arbitary images that can be crafted by malicious attackers
should be considered sane thing to do, though!)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (29 commits)
ext4: force revalidation of directory pointer after seekdir(2)
ext4: add extra checks to ext4_xattr_block_get()
ext4: add bounds checking to ext4_xattr_find_entry()
ext4: move call to ext4_error() into ext4_xattr_check_block()
ext4: don't show data=<mode> option if defaulted
ext4: omit init_itable=n in procfs when disabled
ext4: show more binary mount options in procfs
ext4: simplify kobject usage
ext4: remove unused parameters in sysfs code
ext4: null out kobject* during sysfs cleanup
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
ext4: always initialize the crc32c checksum driver
ext4: fail ext4_iget for root directory if unallocated
ext4: limit xattr size to INT_MAX
ext4: add validity checks for bitmap block numbers
ext4: fix comments in ext4_swap_extents()
ext4: use generic_writepages instead of __writepage/write_cache_pages
ext4: don't complain about incorrect features when probing
ext4: remove EXT4_STATE_DIOREAD_LOCK flag
ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()
...
Merge tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"Includes SMB3.11 security improvements, as well as various fixes for
stable and some debugging improvements"
* tag '4.17-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add minor debug message during negprot
smb3: Fix root directory when server returns inode number of zero
cifs: fix sparse warning on previous patch in a few printks
cifs: add server->vals->header_preamble_size
cifs: smbd: disconnect transport on RDMA errors
cifs: smbd: avoid reconnect lockup
Don't log confusing message on reconnect by default
Don't log expected error on DFS referral request
fs: cifs: Replace _free_xid call in cifs_root_iget function
SMB3.1.1 dialect is no longer experimental
Tree connect for SMB3.1.1 must be signed for non-encrypted shares
fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
CIFS: fix sha512 check in cifs_crypto_secmech_release
CIFS: implement v3.11 preauth integrity
CIFS: add sha512 secmech
CIFS: refactor crypto shash/sdesc allocation&free
Update README file for cifs.ko
Update TODO list for cifs.ko
cifs: fix memory leak in SMB2_open()
CIFS: SMBD: fix spelling mistake: "faield" and "legnth"
Merge tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've only got nine GFS2 patches for this merge window:
- report journal recovery times more accurately during journal replay
(Abhi Das)
- fix fallocate chunk size (Andreas Gruenbacher)
- correctly dirty inodes during rename (Andreas Gruenbacher)
- improve the comment for function gfs2_block_map (Andreas
Gruenbacher)
- improve kernel trace point iomap end: The physical block address
was added (Andreas Gruenbacher)
- fix a nasty file system corruption bug that surfaced in xfstests
476 in punch-hole/truncate (Andreas Gruenbacher)
- fix a problem Christoph Helwig pointed out, namely, that GFS2 was
misusing the IOMAP_ZERO flag. The zeroing of new blocks was moved
to the proper fallocate code (Andreas Gruenbacher)
- declare function gfs2_remove_from_ail as static (Bob Peterson)
- only set PageChecked for jdata page writes (Bob Peterson)"
* tag 'gfs2-4.17.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: time journal recovery steps accurately
gfs2: Zero out fallocated blocks in fallocate_chunk
gfs2: Check for the end of metadata in punch_hole
gfs2: gfs2_iomap_end tracepoint: log block address
gfs2: Improve gfs2_block_map comment
GFS2: Only set PageChecked for jdata pages
GFS2: Make function gfs2_remove_from_ail static
gfs2: Dirty source inode during rename
gfs2: Fix fallocate chunk size
Merge tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"There are a several user visible changes, the rest is mostly invisible
and continues to clean up the whole code base.
User visible changes:
- new mount option nossd_spread (pair for ssd_spread)
- mount option subvolid will detect junk after the number and fail
the mount
- add message after cancelled device replace
- direct module dependency on libcrc32, removed own crc wrappers
- removed user space transaction ioctls
- use lighter locking when reading /proc/self/mounts, RCU instead of
mutex to avoid unnecessary contention
Enhancements:
- skip writeback of last page when truncating file to same size
- send: do not issue unnecessary truncate operations
- mount option token specifiers: use %u for unsigned values, more
validation
- selftests: more tree block validations
qgroups:
- preparatory work for splitting reservation types for data and
metadata, this should allow for more accurate tracking and fix some
issues with underflows or do further enhancements
- split metadata reservations for started and joined transaction so
they do not get mixed up and are accounted correctly at commit time
- with the above, it's possible to revert patch that potentially
deadlocks when trying to make more space by explicitly committing
when the quota limit is hit
- fix root item corruption when multiple same source snapshots are
created with quota enabled
RAID56:
- make sure target is identical to source when raid56 rebuild fails
after dev-replace
- faster rebuild during scrub, batch by stripes and not
block-by-block
- make more use of cached data when rebuilding from a missing device
Fixes:
- null pointer deref when device replace target is missing
- fix fsync after hole punching when using no-holes feature
- fix lockdep splat when allocating percpu data with wrong GFP flags
Cleanups, refactoring, core changes:
- drop redunant parameters from various functions
- kill and opencode trivial helpers
- __cold/__exit function annotations
- dead code removal
- continued audit and documentation of memory barriers
- error handling: handle removal from uuid tree
- error handling: remove handling of impossible condtitons
- more debugging or error messages
- updated tracepoints
- one VLA use removal (and one still left)"
* tag 'for-4.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
btrfs: lift errors from add_extent_changeset to the callers
Btrfs: print error messages when failing to read trees
btrfs: user proper type for btrfs_mask_flags flags
btrfs: split dev-replace locking helpers for read and write
btrfs: remove stale comments about fs_mutex
btrfs: use RCU in btrfs_show_devname for device list traversal
btrfs: update barrier in should_cow_block
btrfs: use lockdep_assert_held for mutexes
btrfs: use lockdep_assert_held for spinlocks
btrfs: Validate child tree block's level and first key
btrfs: tests/qgroup: Fix wrong tree backref level
Btrfs: fix copy_items() return value when logging an inode
Btrfs: fix fsync after hole punching when using no-holes feature
btrfs: use helper to set ulist aux from a qgroup
Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
btrfs: qgroup: Update trace events for metadata reservation
btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
btrfs: qgroup: Use separate meta reservation type for delalloc
btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
...
Merge tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"Here's the first round of fixes for XFS for 4.17.
The biggest new features this time around are the addition of lazytime
support, further enhancement of the on-disk inode metadata verifiers,
and a patch to smooth over some of the AGFL padding problems that have
intermittently plagued users since 4.5. I forsee sending a second pull
request next week with further bug fixes and speedups in the online
scrub code and elsewhere.
This series has been run through a full xfstests run over the weekend
and through a quick xfstests run against this morning's master, with
no major failures reported.
Summary of changes for this release:
- Various cleanups and code fixes
- Implement lazytime as a mount option
- Convert various on-disk metadata checks from asserts to -EFSCORRUPTED
- Fix accounting problems with the rmap per-ag reservations
- Refactorings and cleanups for xfs_log_force
- Various bugfixes for the reflink code
- Work around v5 AGFL padding problems to prevent fs shutdowns
- Establish inode fork verifiers to inspect on-disk metadata
correctness
- Various online scrub fixes
- Fix v5 swapext blowing up on deleted inodes"
* tag 'xfs-4.17-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (49 commits)
xfs: do not log/recover swapext extent owner changes for deleted inodes
xfs: clean up xfs_mount allocation and dynamic initializers
xfs: remove dead inode version setting code
xfs: catch inode allocation state mismatch corruption
xfs: xfs_scrub_iallocbt_xref_rmap_inodes should use xref_set_corrupt
xfs: flag inode corruption if parent ptr doesn't get us a real inode
xfs: don't accept inode buffers with suspicious unlinked chains
xfs: move inode extent size hint validation to libxfs
xfs: record inode buf errors as a xref error in inobt scrubber
xfs: remove xfs_buf parameter from inode scrub methods
xfs: inode scrubber shouldn't bother with raw checks
xfs: bmap scrubber should do rmap xref with bmap for sparse files
xfs: refactor inode buffer verifier error logging
xfs: refactor inode verifier error logging
xfs: refactor bmap record validation
xfs: sanity-check the unused space before trying to use it
xfs: detect agfl count corruption and reset agfl
xfs: unwind the try_again loop in xfs_log_force
xfs: refactor xfs_log_force_lsn
xfs: minor cleanup for xfs_reflink_end_cow
...
Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs dcache updates from Al Viro:
"Part of this is what the trylock loop elimination series has turned
into, part making d_move() preserve the parent (and thus the path) of
victim, plus some general cleanups"
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
d_genocide: move export to definition
fold dentry_lock_for_move() into its sole caller and clean it up
make non-exchanging __d_move() copy ->d_parent rather than swap them
oprofilefs: don't oops on allocation failure
lustre: get rid of pointless casts to struct dentry *
debugfs_lookup(): switch to lookup_one_len_unlocked()
fold lookup_real() into __lookup_hash()
take out orphan externs (empty_string/slash_string)
split d_path() and friends into a separate file
dcache.c: trim includes
fs/dcache: Avoid a try_lock loop in shrink_dentry_list()
get rid of trylock loop around dentry_kill()
handle move to LRU in retain_dentry()
dput(): consolidate the "do we need to retain it?" into an inlined helper
split the slow part of lock_parent() off
now lock_parent() can't run into killed dentry
get rid of trylock loop in locking dentries on shrink list
d_delete(): get rid of trylock loop
fs/dcache: Move dentry_kill() below lock_parent()
fs/dcache: Remove stale comment from dentry_kill()
...
The __net_initdata section cannot currently be used for structures that
get cleaned up in an exitcall using unregister_pernet_operations:
WARNING: vmlinux.o(.text+0x868c34): Section mismatch in reference from the function nsim_devlink_exit() to the (unknown reference) .init.data:(unknown)
The function nsim_devlink_exit() references
the (unknown reference) __initdata (unknown).
This is often because nsim_devlink_exit lacks a __initdata
annotation or the annotation of (unknown) is wrong.
WARNING: vmlinux.o(.text+0x868c64): Section mismatch in reference from the function nsim_devlink_init() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x8692bc): Section mismatch in reference from the function nsim_fib_exit() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x869300): Section mismatch in reference from the function nsim_fib_init() to the (unknown reference) .init.data:(unknown)
As that warning tells us, discarding the structure after a module is
loaded would lead to a undefined behavior when that module is removed.
It might be possible to change that annotation so it has no effect for
loadable modules, but I have not figured out exactly how to do that, and
we want this to be fixed in -rc1.
This just removes the annotations, just like we do for all other such
modules.
Fixes: 12c6871ba1a1 ("netdevsim: Add simple FIB resource controller via devlink") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The ctpio_dmabuf_start entry is not actually a stat and shouldn't
be exposed to ethtool.
Fixes: 00a0c984ba34 ("sfc: expose CTPIO stats on NICs that support them") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Apr 2018 15:35:10 +0000 (08:35 -0700)]
inet: frags: fix ip6frag_low_thresh boundary
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.
ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.
Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.
Fixes: ab7d86f00e47 ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: Fix namespace violation in tipc_sk_fill_sock_diag
To fetch UID info for socket diagnostics, we determine the
namespace of user context using tipc socket instance. This
may cause namespace violation, as the kernel will remap based
on UID.
We fix this by fetching namespace info using the calling userspace
netlink socket.
Fixes: e9a4ef2a0a6e (tipc: implement socket diagnostics for AF_TIPC) Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 4 Apr 2018 12:30:01 +0000 (14:30 +0200)]
net: avoid unneeded atomic operation in ip*_append_data()
After commit d625f520d1ee ("ipv4: factorize sk_wmem_alloc updates
done by __ip_append_data()") and commit 6629e79b1cbb ("ipv6:
factorize sk_wmem_alloc updates done by __ip6_append_data()"),
when transmitting sub MTU datagram, an addtional, unneeded atomic
operation is performed in ip*_append_data() to update wmem_alloc:
in the above condition the delta is 0.
The above cause small but measurable performance regression in UDP
xmit tput test with packet size below MTU.
This change avoids such overhead updating wmem_alloc only if
wmem_alloc_delta is non zero.
The error path is left intentionally unmodified: it's a slow path
and simplicity is preferred to performances.
Fixes: d625f520d1ee ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") Fixes: 6629e79b1cbb ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The new of_get_nvmem_mac_address() helper function causes a link error
with CONFIG_NVMEM=m:
drivers/of/of_net.o: In function `of_get_nvmem_mac_address':
of_net.c:(.text+0x168): undefined reference to `of_nvmem_cell_get'
of_net.c:(.text+0x19c): undefined reference to `nvmem_cell_read'
of_net.c:(.text+0x1a8): undefined reference to `nvmem_cell_put'
I could not come up with a good solution for this, as the code is always
built-in. Using an #if IS_REACHABLE() check around it would solve the
link time issue but then stop it from working in that configuration.
Making of_nvmem_cell_get() an inline function could also solve that, but
seems a bit ugly since it's somewhat larger than most inline functions,
and it would just bring that problem into the callers. Splitting the
function into a separate file might be an alternative.
This uses the big hammer by making CONFIG_NVMEM itself a 'bool' symbol,
which avoids the problem entirely but makes the vmlinux larger for anyone
that might use NVMEM support but doesn't need it built-in otherwise.
Fixes: fb2f710726f1 ("of_net: Implement of_get_nvmem_mac_address helper") Cc: Mike Looijmans <mike.looijmans@topic.nl> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mike Looijmans Signed-off-by: David S. Miller <davem@davemloft.net>
Tan Xiaojun [Wed, 4 Apr 2018 09:40:48 +0000 (17:40 +0800)]
net: hns3: fix length overflow when CONFIG_ARM64_64K_PAGES
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length is u16, it will overflow. So change it
to u32.
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The NSP default buffer is a piece of NFP memory where additional
command data can be placed. Its format has been copied from
host buffer, but the PCIe selection bits do not make sense in
this case. If those get masked out from a NFP address - writes
to random place in the chip memory may be issued and crash the
device.
Even in the general NSP buffer case, it doesn't make sense to have the
PCIe selection bits there anymore. These are unused at the moment, and
when it becomes necessary, the PCIe selection bits should rather be
moved to another register to utilise more bits for the buffer address.
This has never been an issue because the buffer used to be
allocated in memory with less-than-38-bit-long address but that
is about to change.
Fixes: d7dbd1e2d5f8 ("nfp: add support for service processor access") Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Graf [Tue, 3 Apr 2018 22:19:35 +0000 (00:19 +0200)]
lan78xx: Connect phy early
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:
The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.
The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.
With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 2 Apr 2018 20:31:20 +0000 (13:31 -0700)]
nfp: add a separate counter for packets with CHECKSUM_COMPLETE
We are currently counting packets with CHECKSUM_COMPLETE as
"hw_rx_csum_ok". This is confusing. Add a new counter.
To make sure it fits in the same cacheline move the less used
error counter to a different location.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Tue, 3 Apr 2018 17:11:19 +0000 (19:11 +0200)]
tipc: Fix missing list initializations in struct tipc_subscription
When an item of struct tipc_subscription is created, we fail to
initialize the two lists aggregated into the struct. This has so far
never been a problem, since the items are just added to a root
object by list_add(), which does not require the addee list to be
pre-initialized. However, syzbot is provoking situations where this
addition fails, whereupon the attempted removal if the item from
the list causes a crash.
This problem seems to always have been around, despite that the code
for creating this object was rewritten in commit 7cb0b36eac5d ("tipc:
collapse subscription creation functions"), which is still in net-next.
We fix this for that commit by initializing the two lists properly.
Fixes: 7cb0b36eac5d ("tipc: collapse subscription creation functions") Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ipv6: udp: set dst cache for a connected sk if current not valid
A new RTF_CACHE route can be created with the socket's dst cache
update between the below calls in udpv6_sendmsg(), when datagram
sending results to ICMPV6_PKT_TOOBIG error:
dst = ip6_sk_dst_lookup_flow(...)
...
release_dst:
if (dst) {
if (connected) {
ip6_dst_store(sk, dst)
Therefore, the new socket's dst cache reset to the old one on
"release_dst:".
The first three patches prepare the code to store dst cache
with ip6_sk_dst_lookup_flow():
* the first patch adds ip6_sk_dst_store_flow() function with
commonly used source and destiantion addresses checks using
the flow information.
* the second patch adds a new argument to ip6_sk_dst_lookup_flow()
and ability to store dst in the socket's cache. Also, the two
users of the function are updated without enabling the new
behavior: pingv6_sendmsg() and udpv6_sendmsg().
* the third patch makes 'connected' variable in udpv6_sendmsg()
to be consistent with ip6_sk_dst_store_flow(), changes its type
from int to bool.
The last patch contains the actual fix that removes sk dst cache
update in the end of udpv6_sendmsg(), and allows to do it in
ip6_sk_dst_lookup_flow().
v6: * use bool type for a new parameter in ip_sk_dst_lookup_flow()
* add one more patch to convert 'connected' variable in
udpv6_sendmsg() from int to bool type. If it shouldn't be
here I will resend it when the net-next is opened.
v5: * relocate ip6_sk_dst_store_flow() to net/ipv6/route.c and
rename ip6_dst_store_flow() to ip6_sk_dst_store_flow() as
suggested by Martin
v4: * fix the error in the build of ip_dst_store_flow() reported by
kbuild test robot due to missing checks for CONFIG_IPV6: add
new function to ip6_output.c instead of ip6_route.h
* add 'const' to struct flowi6 in ip6_dst_store_flow()
* minor commit messages fixes
v3: * instead of moving ip6_dst_store() above udp_v6_send_skb(),
update socket's dst cache inside ip6_sk_dst_lookup_flow()
if the current one is invalid
* the issue not reproduced in 4.1, but starting from 4.2. Add
one more 'Fixes:' commit that creates new RTF_CACHE route.
Though, it is also mentioned in the first one
====================
Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: udp: set dst cache for a connected sk if current not valid
A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow()
and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending
results to ICMPV6_PKT_TOOBIG error:
udp_v6_send_skb(), for example with vti6 tunnel:
vti6_xmit(), get ICMPV6_PKT_TOOBIG error
skb_dst_update_pmtu(), can create a RTF_CACHE clone
icmpv6_send()
...
udpv6_err()
ip6_sk_update_pmtu()
ip6_update_pmtu(), can create a RTF_CACHE clone
...
ip6_datagram_dst_update()
ip6_dst_store()
And after commit df5a397ceae2 ("ipv6: datagram: Update dst cache of
a connected datagram sk during pmtu update"), the UDPv6 error handler
can update socket's dst cache, but it can happen before the update in
the end of udpv6_sendmsg(), preventing getting the new dst cache on
the next udpv6_sendmsg() calls.
In order to fix it, save dst in a connected socket only if the current
socket's dst cache is invalid.
The previous patch prepared ip6_sk_dst_lookup_flow() to do that with
the new argument, and this patch enables it in udpv6_sendmsg().
Fixes: df5a397ceae2 ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update") Fixes: 0c8dfdb319ff ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: add a wrapper for ip6_dst_store() with flowi6 checks
Move commonly used pattern of ip6_dst_store() usage to a separate
function - ip6_sk_dst_store_flow(), which will check the addresses
for equality using the flow information, before saving them.
There is no functional changes in this patch. In addition, it will
be used in the next patch, in ip6_sk_dst_lookup_flow().
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 3 Apr 2018 09:31:45 +0000 (10:31 +0100)]
net: phy: marvell10g: add thermal hwmon device
Add a thermal monitoring device for the Marvell 88x3310, which updates
once a second. We also need to hook into the suspend/resume mechanism
to ensure that the thermal monitoring is reconfigured when we resume.
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 3 Apr 2018 01:48:37 +0000 (18:48 -0700)]
pptp: remove a buggy dst release in pptp_connect()
Once dst has been cached in socket via sk_setup_caps(),
it is illegal to call ip_rt_put() (or dst_release()),
since sk_setup_caps() did not change dst refcount.
We can still dereference it since we hold socket lock.
Caugth by syzbot :
BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline]
BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185
Write of size 4 at addr ffff8801c54dc040 by task syz-executor4/20088
net: dsa: mt7530: Use NULL instead of plain integer
We would be passing 0 instead of NULL as the rsp argument to
mt7530_fdb_cmd(), fix that.
Fixes: b2cac9613626 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in
initializer (different address spaces)
drivers/net/dsa/b53/b53_mmap.c:33:31: expected unsigned char
[noderef] [usertype] <asn:2>*regs
drivers/net/dsa/b53/b53_mmap.c:33:31: got void *priv
and indeed, while what we are doing is functional, we are dereferencing
a void * pointer into a void __iomem * which is not great. Just use the
defined b53_mmap_priv structure which holds our register base and use
that.
Fixes: c29d61c3ee43 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 2 Apr 2018 18:01:27 +0000 (11:01 -0700)]
af_unix: remove redundant lockdep class
After commit c5a3756c21eb ("net/socket: use per af lockdep classes for sk queues")
sock queue locks now have per-af lockdep classes, including unix socket.
It is no longer necessary to workaround it.
I noticed this while looking at a syzbot deadlock report, this patch
itself doesn't fix it (this is why I don't add Reported-by).
Fixes: c5a3756c21eb ("net/socket: use per af lockdep classes for sk queues") Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series fixes the same warning reported by sparse in bcmsysport and
bcmgenet in the code that deals with inserting the TX checksum pointers:
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types)
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: expected unsigned short [unsigned] [usertype] val
drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: got restricted __be16 [usertype] protocol
This patch fixes both issues by using the same construct and not swapping
skb->protocol but instead the values we are checking against.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()
skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.
Fixes: 2912eb7400ef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
skb->protocol is a __be16 which we would be calling htons() against,
while this is not wrong per-se as it correctly results in swapping the
value on LE hosts, this still upsets sparse. Adopt a similar pattern to
what other drivers do and just assign ip_ver to skb->protocol, and then
use htons() against the different constants such that the compiler can
resolve the values at build time.
Fixes: 390dff99da87 ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Mon, 2 Apr 2018 22:51:39 +0000 (23:51 +0100)]
rxrpc: Fix undefined packet handling
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
should just be discarded rather than being aborted like other undefined
packet types.
Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arm has an optional MULTI_IRQ_HANDLER, which openrisc copied but didn't
make optional. The multi irq handler infrastructure has been copied to
generic code selectable with a new config symbol. That symbol can be
selected by randconfig builds and can cause build breakage.
Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents
the core config symbol from being selected. The openrisc local config
symbol will be removed once openrisc gets converted to the generic code.
arm has an optional MULTI_IRQ_HANDLER, which arm64 copied but didn't make
optional. The multi irq handler infrastructure has been copied to generic
code selectable with a new config symbol. That symbol can be selected by
randconfig builds and can cause build breakage.
Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents
the core config symbol from being selected. The arm64 local config symbol
will be removed once arm64 gets converted to the generic code.
genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
These config switches enable the same code in the core and the not yet
converted architecture code. They can be selected both by randconfig builds
and cause linker error because the same symbols are defined twice.
Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to
prevent that. The dependency will be removed once all architectures are
converted over.
Anders Roxell [Tue, 3 Apr 2018 12:09:47 +0000 (14:09 +0200)]
kernel/bpf/syscall: fix warning defined but not used
There will be a build warning -Wunused-function if CONFIG_CGROUP_BPF
isn't defined, since the only user is inside #ifdef CONFIG_CGROUP_BPF:
kernel/bpf/syscall.c:1229:12: warning: ‘bpf_prog_attach_check_attach_type’
defined but not used [-Wunused-function]
static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current code moves function bpf_prog_attach_check_attach_type inside
ifdef CONFIG_CGROUP_BPF.
Fixes: fab05fade032 ("bpf: Check attach type at prog load time") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 2 Apr 2018 19:50:52 +0000 (12:50 -0700)]
bpf: sockmap, duplicates release calls may NULL sk_prot
It is possible to have multiple ULP tcp_release call paths in flight
if a sock is closed and simultaneously being removed from the sockmap
control path. The result would be setting the sk_prot to the saved
values on the first iteration and then on the second iteration setting
the value to NULL.
This patch resolves this by ensuring we only reset the sk_prot pointer
if we have a valid saved state to set.
Fixes: 94d211d69db27 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 2 Apr 2018 19:50:46 +0000 (12:50 -0700)]
bpf: sockmap, free memory on sock close with cork data
If a socket with pending cork data is closed we do not return the
memory to the socket until the garbage collector free's the psock
structure. The garbage collector though can run after the sock has
completed its close operation. If this ordering happens the sock code
will through a WARN_ON because there is still outstanding memory
accounted to the sock.
To resolve this ensure we return memory to the sock when a socket
is closed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: 1bfdc86924ae ("bpf: sockmap, add msg_cork_bytes() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
"There was a lot of work this cycle fixing bugs that were discovered
after the merge window and getting everything ready where we can
reasonably support fully unprivileged fuse. The bug fixes you already
have and much of the unprivileged fuse work is coming in via other
trees.
Still left for fully unprivileged fuse is figuring out how to cleanly
handle .set_acl and .get_acl in the legacy case, and properly handling
of evm xattrs on unprivileged mounts.
Included in the tree is a cleanup from Alexely that replaced a linked
list with a statically allocated fix sized array for the pid caches,
which simplifies and speeds things up.
Then there is are some cleanups and fixes for the ipc namespace. The
motivation was that in reviewing other code it was discovered that
access ipc objects from different pid namespaces recorded pids in such
a way that when asked the wrong pids were returned. In the worst case
there has been a measured 30% performance impact for sysvipc
semaphores. Other test cases showed no measurable performance impact.
Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
performance both gave the nod that this is good enough.
Casey Schaufler and James Morris have given their approval to the LSM
side of the changes.
I simplified the types and the code dealing with sysvipc to pass just
kern_ipc_perm for all three types of ipc. Which reduced the header
dependencies throughout the kernel and simplified the lsm code.
Which let me work on the pid fixes without having to worry about
trivial changes causing complete kernel recompiles"
* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ipc/shm: Fix pid freeing.
ipc/shm: fix up for struct file no longer being available in shm.h
ipc/smack: Tidy up from the change in type of the ipc security hooks
ipc: Directly call the security hook in ipc_ops.associate
ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
ipc/util: Helpers for making the sysvipc operations pid namespace aware
ipc: Move IPCMNI from include/ipc.h into ipc/util.h
msg: Move struct msg_queue into ipc/msg.c
shm: Move struct shmid_kernel into ipc/shm.c
sem: Move struct sem and struct sem_array into ipc/sem.c
msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
pidns: simpler allocation of pid_* caches