Trond Myklebust [Sun, 10 Mar 2019 15:17:13 +0000 (11:17 -0400)]
SUNRPC: Check whether the task was transmitted before rebind/reconnect
Before initiating transport actions that require putting the task to sleep,
such as rebinding or reconnecting, we should check whether or not the task
was already transmitted.
Trond Myklebust [Tue, 5 Mar 2019 12:30:48 +0000 (07:30 -0500)]
SUNRPC: Fix up RPC back channel transmission
Now that transmissions happen through a queue, we require the RPC tasks
to handle error conditions that may have been set while they were
sleeping. The back channel does not currently do this, but assumes
that any error condition happens during its own call to xprt_transmit().
The solution is to ensure that the back channel splits out the
error handling just like the forward channel does.
Fixes: 402368d6c65b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Trond Myklebust [Mon, 4 Mar 2019 19:19:31 +0000 (14:19 -0500)]
SUNRPC: Prevent thundering herd when the socket is not connected
If the socket is not connected, then we want to initiate a reconnect
rather that trying to transmit requests. If there is a large number
of requests queued and waiting for the lock in call_transmit(),
then it can take a while for one of the to loop back and retake
the lock in call_connect.
Fixes: 402368d6c65b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Trond Myklebust [Tue, 26 Feb 2019 18:52:50 +0000 (13:52 -0500)]
NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive
If the DS is unresponsive, we want to just mark it as such, while
reporting the errors. If the server later returns the same deviceid
in a new layout, then we don't want to have to look it up again.
In ff_layout_mirror_valid() we may not want to invalidate the layout
segment despite the call to GETDEVICEINFO failing. The reason is that
a read may still be able to make progress on another mirror.
So instead we let the caller (in this case nfs4_ff_layout_prepare_ds())
decide whether or not it needs to invalidate.
Trond Myklebust [Thu, 14 Feb 2019 18:45:45 +0000 (13:45 -0500)]
NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds()
While we may want to skip attempting to connect to a downed mirror
when we're deciding which mirror to select for a read, we do not
want to do so once we've committed to attempting the I/O in
ff_layout_read/write_pagelist(), or ff_layout_initiate_commit()
Trond Myklebust [Mon, 11 Feb 2019 03:38:43 +0000 (22:38 -0500)]
NFS/flexfiles: Send LAYOUTERROR when failing over mirrored reads
When a read to the preferred mirror returns an error, the flexfiles
driver records the error in the inode list and currently marks the
layout for return before failing over the attempted read to the next
mirror.
What we actually want to do is fire off a LAYOUTERROR to notify the
MDS that there is an issue with the preferred mirror, then we fail
over. Only once we've failed to read from all mirrors should we
return the layout.
Trond Myklebust [Wed, 27 Feb 2019 20:37:36 +0000 (15:37 -0500)]
NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated
If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Trond Myklebust [Tue, 26 Feb 2019 16:19:46 +0000 (11:19 -0500)]
NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()
If the attempt to instantiate the mirror's layout DS pointer failed,
then that pointer may hold a value of type ERR_PTR(), so we need
to check that before we dereference it.
Fixes: eae9c36784e47 ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Anna Schumaker [Fri, 1 Mar 2019 21:09:56 +0000 (16:09 -0500)]
NFS: Add missing encode / decode sequence_maxsz to v4.2 operations
These really should have been there from the beginning, but we never
noticed because there was enough slack in the RPC request for the extra
bytes. Chuck's recent patch to use au_cslack and au_rslack to compute
buffer size shrunk the buffer enough that this was now a problem for
SEEK operations on my test client.
Trond Myklebust [Fri, 1 Mar 2019 17:13:34 +0000 (12:13 -0500)]
NFSv4.1: Reinitialise sequence results before retransmitting a request
If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.
Trond Myklebust [Fri, 22 Feb 2019 19:20:27 +0000 (14:20 -0500)]
NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
If a bulk layout recall or a metadata server reboot coincides with a
umount, then holding a reference to an inode is unsafe unless we
also hold a reference to the super block.
Fixes: ae05458216e26 ("NFSv4.1: Fix bulk recall and destroy of layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Trond Myklebust [Thu, 21 Feb 2019 19:51:25 +0000 (14:51 -0500)]
NFS: Fix a soft lockup in the delegation recovery code
Fix a soft lockup when NFS client delegation recovery is attempted
but the inode is in the process of being freed. When the
igrab(inode) call fails, and we have to restart the recovery process,
we need to ensure that we won't attempt to recover the same delegation
again.
Fixes: 21f14ac32bfb6 ("NFSv4.1: Test delegation stateids when server...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Trond Myklebust [Wed, 20 Jun 2018 21:53:34 +0000 (17:53 -0400)]
NFSv4.1: Avoid false retries when RPC calls are interrupted
A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a
new RPC call using a slot+sequence number combination that references an
already cached one. Currently, the Linux NFS client will do this if a
user process interrupts an RPC call that is in progress.
The problem with doing so is that we defeat the main mechanism used by
the server to differentiate between a new call and a replayed one. Even
if the server is able to perfectly cache the arguments of the old call,
it cannot know if the client intended to replay or send a new call.
The obvious fix is to bump the sequence number pre-emptively if an
RPC call is interrupted, but in order to deal with the corner cases
where the interrupted call is not actually received and processed by
the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED
as a sign that we need to either wait or locate a correct sequence
number that lies between the value we sent, and the last value that
was acked by a SEQUENCE call on that slot.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
Trond Myklebust [Tue, 19 Feb 2019 18:13:40 +0000 (13:13 -0500)]
SUNRPC: Further cleanups of xs_sendpages()
Now that we send the pages using a struct msghdr, instead of
using sendpage(), we no longer need to 'prime the socket' with
an address for unconnected UDP messages.
Trond Myklebust [Sat, 16 Feb 2019 14:31:47 +0000 (09:31 -0500)]
SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receive
If the client stream receive code receives an ESHUTDOWN error either
because the server closed the connection, or because it sent a
callback which cannot be processed, then we should shut down
the connection.
Trond Myklebust [Wed, 20 Feb 2019 19:56:20 +0000 (14:56 -0500)]
SUNRPC: Don't reset the stream record info when the receive worker is running
To ensure that the receive worker has exclusive access to the stream record
info, we must not reset the contents other than when holding the
transport->recv_mutex.
ZhangXiaoxu [Mon, 18 Feb 2019 14:56:43 +0000 (22:56 +0800)]
nfs: fix xfstest generic/099 failed on nfsv3
After setxattr, the nfsv3 cached the acl which set by user.
But at the backend, the shared file system (eg. ext4) will check
the acl, if it can merged with mode, it won't add acl to the file.
So, the nfsv3 cached acl is redundant.
Kazuo Ito [Thu, 14 Feb 2019 09:39:03 +0000 (18:39 +0900)]
pNFS: Avoid read/modify/write when it is not necessary
As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(e0b84cb3dfd78 pnfs: add flag to force read-modify-write in ->write_begin)
The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even it is block-aligned.
The change does not affect the optimization for modify-write-read
cases (548f3e70985b0 NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.
Testing results:
We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.
Kazuo Ito [Thu, 14 Feb 2019 09:36:58 +0000 (18:36 +0900)]
pNFS: Fix potential corruption of page being written
nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
block or SCSI layout was in use, therefore we could lose data forever
if the page being written was filled by a read before completion.
Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
luanshi [Tue, 29 Jan 2019 07:34:17 +0000 (15:34 +0800)]
NFS: readdirplus optimization by cache mechanism
When listing very large directories via NFS, clients may take a long
time to complete. There are about three factors involved:
First of all, ls and practically every other method of listing a
directory including python os.listdir and find rely on libc readdir().
However readdir() only reads 32K of directory entries at a time, which
means that if you have a lot of files in the same directory, it is going
to take an insanely long time to read all the directory entries.
Secondly, libc readdir() reads 32K of directory entries at a time, in
kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will
be called for one page, which introduces many readdirplus rpc calls.
Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry)
to fill one page (filled by dentry), we found that nearly one third of
data was wasted.
To solve above problems, pagecache mechanism was introduced. One NFS
readdirplus rpc will ask for a large data (more than 32k), the data can
fill more than one page, the cached pages can be used for next readdir
call. This can reduce many readdirplus rpc calls and improve readdirplus
performance.
TESTING:
When listing very large directories(include 300 thousand files) via NFS
time ls -l /nfs_mount | wc -l
without the patch:
300001
real 1m53.524s
user 0m2.314s
sys 0m2.599s
with the patch:
300001
real 0m23.487s
user 0m2.305s
sys 0m2.558s
fs/nfs: Fix nfs_parse_devname to not modify it's argument
In the rare and unsupported case of a hostname list nfs_parse_devname
will modify dev_name. There is no need to modify dev_name as the all
that is being computed is the length of the hostname, so the computed
length can just be shorted.
Fixes: 5bb8bfece35b ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Julia Lawall [Sun, 23 Dec 2018 08:57:10 +0000 (09:57 +0100)]
NFS: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares has never
been used.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>
Fixes: 50a45beb5c3b ("NFSv4.1 Use MDS auth flavor for data server connection") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Trond Myklebust [Wed, 30 Jan 2019 19:51:26 +0000 (14:51 -0500)]
SUNRPC: Use poll() to fix up the socket requeue races
Because we clear XPRT_SOCK_DATA_READY before reading, we can end up
with a situation where new data arrives, causing xs_data_ready() to
queue up a second receive worker job for the same socket, which then
immediately gets stuck waiting on the transport receive mutex.
The fix is to only clear XPRT_SOCK_DATA_READY once we're done reading,
and then to use poll() to check if we might need to queue up a new
job in order to deal with any new data.
Trond Myklebust [Mon, 18 Feb 2019 15:02:29 +0000 (10:02 -0500)]
SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs
Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we
ensure memory allocations for asynchronous rpc calls don't ever end
up recursing back to the NFS layer for memory reclaim.
Trond Myklebust [Mon, 18 Feb 2019 18:06:54 +0000 (13:06 -0500)]
NFS: Ensure NFS writeback allocations don't recurse back into NFS.
All the allocations that we can hit in the NFS layer and sunrpc layers
themselves are already marked as GFP_NOFS, but we need to ensure that
any calls to generic kernel functionality do the right thing as well.
Trond Myklebust [Wed, 13 Feb 2019 15:39:39 +0000 (10:39 -0500)]
NFS: Pass error information to the pgio error cleanup routine
Allow the caller to pass error information when cleaning up a failed
I/O request so that we can conditionally take action to cancel the
request altogether if the error turned out to be fatal.
Trond Myklebust [Wed, 13 Feb 2019 14:21:38 +0000 (09:21 -0500)]
NFS: Fix I/O request leakages
When we fail to add the request to the I/O queue, we currently leave it
to the caller to free the failed request. However since some of the
requests that fail are actually created by nfs_pageio_add_request()
itself, and are not passed back the caller, this leads to a leakage
issue, which can again cause page locks to leak.
This commit addresses the leakage by freeing the created requests on
error, using desc->pg_completion_ops->error_cleanup()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 83640929b2cac ("nfs: add mirroring support to pgio layer") Cc: stable@vger.kernel.org # v4.0: 15d3e65468a1: nfs: clean up rest of reqs Cc: stable@vger.kernel.org # v4.0: df5de871ea1d: NFS41: pop some layoutget Cc: stable@vger.kernel.org # v4.0+
Linus Torvalds [Wed, 20 Feb 2019 17:42:52 +0000 (09:42 -0800)]
Merge tag 'sound-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are a few last-minute fixes for 5.0.
The most significant one is the OF-node refcount fix for ASoC
simple-card, which could be triggered on many boards. Another fix for
ASoC core is for the error handling in topology, while others are
device-specific fixes for Samsung and HD-audio"
* tag 'sound-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: simple-card: fixup refcount_t underflow
ASoC: topology: free created components in tplg load error
ALSA: hda/realtek: Disable PC beep in passthrough on alc285
ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
Linus Torvalds [Wed, 20 Feb 2019 17:39:53 +0000 (09:39 -0800)]
Merge tag 'pinctrl-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some final pin control fixes (I hope) to round off the v5.0 pin
control development cycle.
Only driver fixes, one for stable:
- Meson8B fixup for the sdc pins
- Fix SDC tile position for Qualcomm QCS404"
* tag 'pinctrl-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
pinctrl: qcom: qcs404: Correct SDC tile
Linus Torvalds [Wed, 20 Feb 2019 17:36:33 +0000 (09:36 -0800)]
Merge tag 'gpio-v5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Two GPIO fixes for the v5.0 series:
- Per-instance irqchip on the MT7621
- Avoid direction setting using pin control on MMP2"
* tag 'gpio-v5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
gpio: MT7621: use a per instance irq_chip structure
Linus Torvalds [Wed, 20 Feb 2019 17:16:11 +0000 (09:16 -0800)]
Merge tag 'mtd/fixes-for-5.0-rc8' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Boris Brezillon:
- Don't add a digit to MTD-backed nvmem device names
- Make sure powernv flash names are unique
* tag 'mtd/fixes-for-5.0-rc8' of git://git.infradead.org/linux-mtd:
mtd: powernv_flash: Fix device registration error
mtd: Use mtd->name when registering nvmem device
Linus Torvalds [Wed, 20 Feb 2019 17:09:33 +0000 (09:09 -0800)]
Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull keys fixes from James Morris:
- Handle quotas better, allowing full quota to be reached.
- Fix the creation of shortcuts in the assoc_array internal
representation when the index key needs to be an exact multiple of
the machine word size.
- Fix a dependency loop between the request_key contruction record and
the request_key authentication key. The construction record isn't
really necessary and can be dispensed with.
- Set the timestamp on a new key rather than leaving it as 0. This
would ordinarily be fine - provided the system clock is never set to
a time before 1970
* 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
keys: Timestamp new keys
keys: Fix dependency loop between construction record and auth key
assoc_array: Fix shortcut creation
KEYS: allow reaching the keys quotas exactly
1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw
Gruszka.
2) Missing memory barriers in xsk, from Magnus Karlsson.
3) rhashtable fixes in mac80211 from Herbert Xu.
4) 32-bit MIPS eBPF JIT fixes from Paul Burton.
5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens.
6) GSO validation fixes from Willem de Bruijn.
7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue.
8) More strict checks in tcp_v4_err(), from Eric Dumazet.
9) af_alg_release should NULL out the sk after the sock_put(), from Mao
Wenan.
10) Missing unlock in mac80211 mesh error path, from Wei Yongjun.
11) Missing device put in hns driver, from Salil Mehta.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
sky2: Increase D3 delay again
vhost: correctly check the return value of translate_desc() in log_used()
net: netcp: Fix ethss driver probe issue
net: hns: Fixes the missing put_device in positive leg for roce reset
net: stmmac: Fix a race in EEE enable callback
qed: Fix iWARP syn packet mac address validation.
qed: Fix iWARP buffer size provided for syn packet processing.
r8152: Add support for MAC address pass through on RTL8153-BD
mac80211: mesh: fix missing unlock on error in table_path_del()
net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"
net: crypto set sk to NULL when af_alg_release.
net: Do not allocate page fragments that are not skb aligned
mm: Use fixed constant in page_frag_alloc instead of size + 1
tcp: tcp_v4_err() should be more careful
tcp: clear icsk_backoff in tcp_write_queue_purge()
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
qmi_wwan: apply SET_DTR quirk to Sierra WP7607
net: stmmac: handle endianness in dwmac4_get_timestamp
doc: Mention MSG_ZEROCOPY implementation for UDP
mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable
...
Kai-Heng Feng [Tue, 19 Feb 2019 15:45:29 +0000 (23:45 +0800)]
sky2: Increase D3 delay again
Another platform requires even longer delay to make the device work
correctly after S3.
So increase the delay to 300ms.
BugLink: https://bugs.launchpad.net/bugs/1798921 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Tue, 19 Feb 2019 06:53:44 +0000 (14:53 +0800)]
vhost: correctly check the return value of translate_desc() in log_used()
When fail, translate_desc() returns negative value, otherwise the
number of iovs. So we should fail when the return value is negative
instead of a blindly check against zero.
Detected by CoverityScan, CID# 1442593: Control flow issues (DEADCODE)
Fixes: 60b44f4c6dce ("vhost: log dirty page correctly") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Tue, 19 Feb 2019 11:35:55 +0000 (12:35 +0100)]
Merge tag 'asoc-fix-v5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.0
A few small fixes, a driver fix for Samsung, a fix for refcounting of
of_nodes in the simple-card driver that triggered on a lot of systems
and a fix for topology error handling.
Murali Karicheri [Mon, 18 Feb 2019 20:10:51 +0000 (15:10 -0500)]
net: netcp: Fix ethss driver probe issue
Recent commit below has introduced a bug in netcp driver that causes
the ethss driver probe failure and thus break the networking function
on K2 SoCs such as K2HK, K2L, K2E etc. This patch fixes the issue to
restore networking on the above SoCs.
Fixes: 00130fa4243a ("net: ethernet: Convert to using %pOFn instead of device_node.name") Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Mon, 18 Feb 2019 17:40:32 +0000 (17:40 +0000)]
net: hns: Fixes the missing put_device in positive leg for roce reset
This patch fixes the missing device reference release-after-use in
the positive leg of the roce reset API of the HNS DSAF.
Fixes: dce1a874afa3 ("net: hns: Fix object reference leaks in hns_dsaf_roce_reset()") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Mon, 18 Feb 2019 13:35:03 +0000 (14:35 +0100)]
net: stmmac: Fix a race in EEE enable callback
We are saving the status of EEE even before we try to enable it. This
leads to a race with XMIT function that tries to arm EEE timer before we
set it up.
Fix this by only saving the EEE parameters after all operations are
performed with success.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Fixes: 6d1634718fa8 ("stmmac: add the Energy Efficient Ethernet support") Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon [Mon, 18 Feb 2019 13:24:03 +0000 (15:24 +0200)]
qed: Fix iWARP syn packet mac address validation.
The ll2 forwards all syn packets to the driver without validating the mac
address. Add validation check in the driver's iWARP listener flow and drop
the packet if it isn't intended for the device.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon [Mon, 18 Feb 2019 13:24:02 +0000 (15:24 +0200)]
qed: Fix iWARP buffer size provided for syn packet processing.
The assumption that the maximum size of a syn packet is 128 bytes
is wrong. Tunneling headers were not accounted for.
Allocate buffers large enough for mtu.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 19 Feb 2019 00:36:48 +0000 (16:36 -0800)]
exec: load_script: Do not exec truncated interpreter path
Commit a1f041b053d2 ("exec: load_script: don't blindly truncate
shebang string") was trying to protect against a confused exec of a
truncated interpreter path. However, it was overeager and also refused
to truncate arguments as well, which broke userspace, and it was
reverted. This attempts the protection again, but allows arguments to
remain truncated. In an effort to improve readability, helper functions
and comments have been added.
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Samuel Dionne-Riel <samuel@dionne-riel.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Graham Christensen <graham@grahamc.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Chen [Sat, 16 Feb 2019 09:16:42 +0000 (17:16 +0800)]
r8152: Add support for MAC address pass through on RTL8153-BD
RTL8153-BD is used in Dell DA300 type-C dongle.
It should be added to the whitelist of devices to activate MAC address
pass through.
Per confirming with Realtek all devices containing RTL8153-BD should
activate MAC pass through and there won't use pass through bit on efuse
like in RTL8153-AD.
Signed-off-by: David Chen <david.chen7@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 18 Feb 2019 10:29:29 +0000 (11:29 +0100)]
mac80211: mesh: fix missing unlock on error in table_path_del()
spin_lock_bh() is used in table_path_del() but rcu_read_unlock()
is used for unlocking. Fix it by using spin_unlock_bh() instead
of rcu_read_unlock() in the error handling case.
Fixes: be104a6309a1 ("mac80211: Use linked list instead of rhashtable walk for mesh tables") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a en_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Mon, 18 Feb 2019 02:44:44 +0000 (10:44 +0800)]
net: crypto set sk to NULL when af_alg_release.
KASAN has found use-after-free in sockfs_setattr.
The existed commit 1f3bee5d978f ("socket: close race condition between sock_close()
and sockfs_setattr()") is to fix this simillar issue, but it seems to ignore
that crypto module forgets to set the sk to NULL after af_alg_release.
KASAN report details as below:
BUG: KASAN: use-after-free in sockfs_setattr+0x120/0x150
Write of size 4 at addr ffff88837b956128 by task syz-executor0/4186
Fixes: 1f3bee5d978f ("socket: close race condition between sock_close() and sockfs_setattr()") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8ed00e5e8316e ("ASoC: simple-card: merge simple-scu-card")
merged simple-card and simple-scu-card. Then it had refcount
underflow bug. This patch fixup it.
We will get below error without this patch.
OF: ERROR: Bad of_node_put() on /sound
CPU: 3 PID: 237 Comm: kworker/3:1 Not tainted 5.0.0-rc6+ #1514
Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
Workqueue: events deferred_probe_work_func
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x24/0x30
dump_stack+0xb0/0xec
of_node_release+0xd0/0xd8
kobject_put+0x74/0xe8
of_node_put+0x24/0x30
__of_get_next_child+0x50/0x70
of_get_next_child+0x40/0x68
asoc_simple_card_probe+0x604/0x730
platform_drv_probe+0x58/0xa8
... Reported-by: Vicente Bergas <vicencb@gmail.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Mon, 18 Feb 2019 17:40:16 +0000 (09:40 -0800)]
Merge tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two more tracing fixes
- Have kprobes not use copy_from_user() to access kernel addresses,
because kprobes can legitimately poke at bad kernel memory, which
will fault. Copy from user code should never fault in kernel space.
Using probe_mem_read() can handle kernel address space faulting.
- Put back the entries counter in the tracing output that was
accidentally removed"
* tag 'trace-v5.0-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix number of entries in trace header
kprobe: Do not use uaccess functions to access kernel memory that can fault
David S. Miller [Sun, 17 Feb 2019 23:48:43 +0000 (15:48 -0800)]
Merge branch 'netdev-page_frag_alloc-fixes'
Alexander Duyck says:
====================
Address recent issues found in netdev page_frag_alloc usage
This patch set addresses a couple of issues that I had pointed out to Jann
Horn in response to a recent patch submission.
The first issue is that I wanted to avoid the need to read/modify/write the
size value in order to generate the value for pagecnt_bias. Instead we can
just use a fixed constant which reduces the need for memory read operations
and the overall number of instructions to update the pagecnt bias values.
The other, and more important issue is, that apparently we were letting tun
access the napi_alloc_cache indirectly through netdev_alloc_frag and as a
result letting it create unaligned accesses via unaligned allocations. In
order to prevent this I have added a call to SKB_DATA_ALIGN for the fragsz
field so that we will keep the offset in the napi_alloc_cache
SMP_CACHE_BYTES aligned.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 15 Feb 2019 22:44:18 +0000 (14:44 -0800)]
net: Do not allocate page fragments that are not skb aligned
This patch addresses the fact that there are drivers, specifically tun,
that will call into the network page fragment allocators with buffer sizes
that are not cache aligned. Doing this could result in data alignment
and DMA performance issues as these fragment pools are also shared with the
skb allocator and any other devices that will use napi_alloc_frags or
netdev_alloc_frags.
Fixes: 4d76151be427 ("net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 15 Feb 2019 22:44:12 +0000 (14:44 -0800)]
mm: Use fixed constant in page_frag_alloc instead of size + 1
This patch replaces the size + 1 value introduced with the recent fix for 1
byte allocs with a constant value.
The idea here is to reduce code overhead as the previous logic would have
to read size into a register, then increment it, and write it back to
whatever field was being used. By using a constant we can avoid those
memory reads and arithmetic operations in favor of just encoding the
maximum value into the operation itself.
Fixes: 82bec6ae0e97 ("mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tcp: fix possible crash in tcp_v4_err()
soukjin bae reported a crash in tcp_v4_err() that we
root caused to a missing initialization.
Second patch adds a sanity check in tcp_v4_err() to avoid
future potential problems. Ignoring an ICMP message
is probably better than crashing a machine.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>