]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
3 years agoDocumentation: fix sctp_wmem in ip-sysctl.rst
Xin Long [Thu, 21 Jul 2022 14:35:46 +0000 (10:35 -0400)]
Documentation: fix sctp_wmem in ip-sysctl.rst

[ Upstream commit d5935bd89dcc369c62c9ccb6d301b9866c013e9b ]

Since commit 18662ea47f2a ("sctp: implement memory accounting on tx path"),
SCTP has supported memory accounting on tx path where 'sctp_wmem' is used
by sk_wmem_schedule(). So we should fix the description for this option in
ip-sysctl.rst accordingly.

v1->v2:
  - Improve the description as Marcelo suggested.

Fixes: 18662ea47f2a ("sctp: implement memory accounting on tx path")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:26 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.

[ Upstream commit 8a546f9373e2f7b8da843bab80c843e4d72969f4 ]

While reading sysctl_tcp_invalid_ratelimit, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 7b805d602346 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotcp: Fix a data-race around sysctl_tcp_autocorking.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:25 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_autocorking.

[ Upstream commit 19ea5d82bf8a1597152f84b9887e1887cc5b3641 ]

While reading sysctl_tcp_autocorking, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 4aed7f27920e ("tcp: auto corking")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:24 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.

[ Upstream commit 4509ae5160b2decbd4fc8ddecadf403055cb3d24 ]

While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 50bc59fc3444 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotcp: Fix a data-race around sysctl_tcp_min_tso_segs.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:22 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.

[ Upstream commit 326666362e7146327c6fbb44affccae710494b93 ]

While reading sysctl_tcp_min_tso_segs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 6249207108ef ("tcp: TSO packets automatic sizing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agonet: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
Liang He [Wed, 20 Jul 2022 13:10:03 +0000 (21:10 +0800)]
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()

[ Upstream commit 6232f5fa0371a886b14f26e1fd5b31a037dc810b ]

In bcm5421_init(), we should call of_node_put() for the reference
returned by of_get_parent() which has increased the refcount.

Fixes: a97e737db55a ("[PATCH] ppc64: Add new PHY to sungem")
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20220720131003.1287426-1-windhl@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agonet: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
Vladimir Oltean [Wed, 20 Jul 2022 11:20:57 +0000 (14:20 +0300)]
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii

[ Upstream commit b205a7783aa346128d9f6bb0d5f35c68f4a42c71 ]

While phylink_pcs_ops :: pcs_get_state does return void, xpcs_get_state()
does check for a non-zero return code from xpcs_get_state_c37_sgmii()
and prints that as a message to the kernel log.

However, a non-zero return code from xpcs_read() is translated into
"return false" (i.e. zero as int) and the I/O error is therefore not
printed. Fix that.

Fixes: f5cf66c4e398 ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220720112057.3504398-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoigmp: Fix data-races around sysctl_igmp_qrv.
Kuniyuki Iwashima [Fri, 15 Jul 2022 17:17:44 +0000 (10:17 -0700)]
igmp: Fix data-races around sysctl_igmp_qrv.

[ Upstream commit d768c7366d87f8d9b8ca416cea4c8d77b1e46331 ]

While reading sysctl_igmp_qrv, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

This test can be packed into a helper, so such changes will be in the
follow-up series after net is merged into net-next.

  qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);

Fixes: 2b909d3ac006 ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agonet/tls: Remove the context from the list in tls_device_down
Maxim Mikityanskiy [Thu, 21 Jul 2022 09:11:27 +0000 (12:11 +0300)]
net/tls: Remove the context from the list in tls_device_down

commit b516be679337a105efd45cf987de2faca377743b upstream.

tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: bf5c2d483b13 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
Ziyang Xuan [Thu, 28 Jul 2022 01:33:07 +0000 (09:33 +0800)]
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr

commit f79339d4aab466eb877a24adb4dd6ab647e6811c upstream.

Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.

=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263

CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
 dump_backtrace+0x1a8/0x230
 show_stack+0x20/0x70
 dump_stack_lvl+0x68/0x84
 print_report+0xc4/0x120
 kasan_report+0x84/0x120
 __asan_load4+0x94/0xd0
 find_match.part.0+0x70/0x134
 __find_rr_leaf+0x408/0x470
 fib6_table_lookup+0x264/0x540
 ip6_pol_route+0xf4/0x260
 ip6_pol_route_output+0x58/0x70
 fib6_rule_lookup+0x1a8/0x330
 ip6_route_output_flags_noref+0xd8/0x1a0
 ip6_route_output_flags+0x58/0x160
 ip6_dst_lookup_tail+0x5b4/0x85c
 ip6_dst_lookup_flow+0x98/0x120
 rawv6_sendmsg+0x49c/0xc70
 inet_sendmsg+0x68/0x94

Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1

Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done

$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done

It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.

cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown

So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.

Fixes: f90a184598db ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet: ping6: Fix memleak in ipv6_renew_options().
Kuniyuki Iwashima [Thu, 28 Jul 2022 01:22:20 +0000 (18:22 -0700)]
net: ping6: Fix memleak in ipv6_renew_options().

commit d91e641693aaa94ee120680ca5d9fefaf08b430a upstream.

When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy().  As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.

    struct ipv6_sr_hdr *hdr;
    char data[24] = {0};
    int fd;

    hdr = (struct ipv6_sr_hdr *)data;
    hdr->hdrlen = 2;
    hdr->type = IPV6_SRCRT_TYPE_4;

    fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
    setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
    close(fd);

To fix memory leaks, let's add a destroy function.

Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range.  The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
not succeed until we change the default value.  However, at least
Ubuntu/Fedora/RHEL loosen it.

    $ cat /usr/lib/sysctl.d/50-default.conf
    ...
    -net.ipv4.ping_group_range = 0 2147483647

Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.

  setsockopt
      IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
      IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
      IPV6_HOPOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDR (inet6_sk(sk)->opt)
      IPV6_DSTOPTS (inet6_sk(sk)->opt)
      IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)

  getsockopt
      IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)

For the record, I left a different splat with syzbot's one.

  unreferenced object 0xffff888006270c60 (size 96):
    comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
    hex dump (first 32 bytes):
      01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
      [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
      [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
      [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
      [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
      [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176

Fixes: 6b4e34c00d7c ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoscsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
David Jeffery [Fri, 22 Jul 2022 14:24:48 +0000 (10:24 -0400)]
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown

commit a336cfcfb9803c7388d5a4a2c03606132f8e734e upstream.

During system shutdown or reboot, mpt3sas will reset the firmware back to
ready state. However, the driver leaves running a watchdog work item
intended to keep the firmware in operational state. This causes a second,
unneeded reset on shutdown and moves the firmware back to operational
instead of in ready state as intended. And if the mpt3sas_fwfault_debug
module parameter is set, this extra reset also panics the system.

mpt3sas's scsih_shutdown needs to stop the watchdog before resetting the
firmware back to ready state.

Link: https://lore.kernel.org/r/20220722142448.6289-1-djeffery@redhat.com
Fixes: 80df4e6869c9 ("scsi: mpt3sas: Transition IOC to Ready state during shutdown")
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoscsi: core: Fix warning in scsi_alloc_sgtables()
Jason Yan [Wed, 20 Jul 2022 02:51:20 +0000 (10:51 +0800)]
scsi: core: Fix warning in scsi_alloc_sgtables()

commit 22bb475b2b7fe8e27e94119eb5b32d7609279dd5 upstream.

As explained in SG_IO howto[1]:

"If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of
iov_len lengths. If not, the minimum of the two is the transfer length."

When iovec_count is non-zero and dxfer_len is zero, the sg_io() just
genarated a null bio, and finally caused a warning below. To fix it, skip
generating a bio for this request if dxfer_len is zero.

[1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html

WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Modules linked in:

CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted
5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014
RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8
93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41
   bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb
RSP: 0018:ffffc90000d07558 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000
RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000
R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000
FS:  00007effdec8e700(0000) GS:ffff88802cc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline]
 scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline]
 scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730
 blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851
 __blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299
 blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332
 __blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968
 __blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045
 blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096
 blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451
 blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231
 sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485
 scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline]
 scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921
 sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576
 blkdev_ioctl+0x37a/0x800 block/ioctl.c:588
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:874 [inline]
 __se_sys_ioctl fs/ioctl.c:860 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7effdecdc5d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9
RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004
RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8

Link: https://lore.kernel.org/r/20220720025120.3226770-1-yanaijie@huawei.com
Fixes: 09a5fc218fb8 ("block: fix SG_IO vector request data length handling")
Reported-by: syzbot+d44b35ecfb807e5af0b5@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:21 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.

commit a30eb41d36f05b4a53c9626157865b6f986fc89f upstream.

While reading sysctl_tcp_challenge_ack_limit, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: dbe400804f0b ("tcp: implement RFC 5961 3.2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:20 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.

commit 38bdf925a45ac7e2eec3767ddde4772755ff7956 upstream.

While reading sysctl_tcp_limit_output_bytes, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 6475c3971092 ("tcp: TCP Small Queues")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:18 +0000 (09:50 -0700)]
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.

commit 4cb69a9d96e068da38c4ce225da4c69e7022d70f upstream.

While reading sysctl_tcp_moderate_rcvbuf, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoocteontx2-pf: Fix UDP/TCP src and dst port tc filters
Subbaraya Sundeep [Sun, 24 Jul 2022 08:21:14 +0000 (13:51 +0530)]
octeontx2-pf: Fix UDP/TCP src and dst port tc filters

commit 3226b092612320b79e42c50ded5a577c0ab20cfa upstream.

Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.

Fixes: f5f1176ce016 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "tcp: change pingpong threshold to 3"
Wei Wang [Thu, 21 Jul 2022 20:44:04 +0000 (20:44 +0000)]
Revert "tcp: change pingpong threshold to 3"

commit 50b6b74acb305c445e0798100fd3a75ca6367b7b upstream.

This reverts commit f8c8f53658928f7ed32acb954b15ca82a906e085.

This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.

Fixes: f8c8f5365892 ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <hlm3280@163.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoscsi: ufs: host: Hold reference returned by of_parse_phandle()
Liang He [Tue, 19 Jul 2022 07:15:29 +0000 (15:15 +0800)]
scsi: ufs: host: Hold reference returned by of_parse_phandle()

commit c279b3eaf09ac458dea6af0d9770a86052c78651 upstream.

In ufshcd_populate_vreg(), we should hold the reference returned by
of_parse_phandle() and then use it to call of_node_put() for refcount
balance.

Link: https://lore.kernel.org/r/20220719071529.1081166-1-windhl@126.com
Fixes: 3e50858d4247 ("ufs: Add regulator enable support")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Liang He <windhl@126.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoice: do not setup vlan for loopback VSI
Maciej Fijalkowski [Thu, 7 Jul 2022 10:20:43 +0000 (12:20 +0200)]
ice: do not setup vlan for loopback VSI

commit dba98e26834a71ac51b08cc4a42297e6f0b674b4 upstream.

Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.

Fixes: 9616d5b28366 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
Maciej Fijalkowski [Thu, 7 Jul 2022 10:20:42 +0000 (12:20 +0200)]
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)

commit 00b0b7c781e26c627a3e39fae02fbfe3666474ad upstream.

Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed. EOF is also set as loopback test does not xmit
fragmented frames.

Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead
of EOP and RS pair.

Fixes: 9616d5b28366 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:17 +0000 (09:50 -0700)]
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.

commit 7b8512603083cf7666210748820e9b22a8977b2a upstream.

While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1cb407385d84 ("net-tcp: Disable TCP ssthresh metrics cache by default")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_nometrics_save.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:16 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_nometrics_save.

commit 87ba594f7debe5939db9c71badd898b3590eee6d upstream.

While reading sysctl_tcp_nometrics_save, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_frto.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:15 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_frto.

commit efe9053f85e111d361149bf145d03e8b87056bf1 upstream.

While reading sysctl_tcp_frto, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_adv_win_scale.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:14 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.

commit 5a7221fb7c6056802b9aadb759fed724ab51ce55 upstream.

While reading sysctl_tcp_adv_win_scale, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix a data-race around sysctl_tcp_app_win.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:13 +0000 (09:50 -0700)]
tcp: Fix a data-race around sysctl_tcp_app_win.

commit e5fb1258cb0e9b64984961327781de0a241cc4c6 upstream.

While reading sysctl_tcp_app_win, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotcp: Fix data-races around sysctl_tcp_dsack.
Kuniyuki Iwashima [Wed, 20 Jul 2022 16:50:12 +0000 (09:50 -0700)]
tcp: Fix data-races around sysctl_tcp_dsack.

commit 92ffadf97267a2ae8eefa7b86a40d3463ef06820 upstream.

While reading sysctl_tcp_dsack, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agowatch_queue: Fix missing locking in add_watch_to_object()
Linus Torvalds [Thu, 28 Jul 2022 09:31:12 +0000 (10:31 +0100)]
watch_queue: Fix missing locking in add_watch_to_object()

commit 0686b3daa5694bf6fdcb20c1ddf1a27da0229c79 upstream.

If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.

KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.

While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list.  The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.

Fix this by locking the watcher list in add_watch_to_object().

Fixes: 175295c8c9fa ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agowatch_queue: Fix missing rcu annotation
David Howells [Thu, 28 Jul 2022 09:31:06 +0000 (10:31 +0100)]
watch_queue: Fix missing rcu annotation

commit 45ebef626c9d270c118de7809f57b700ea38d18d upstream.

Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).

Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.

Fixes: 175295c8c9fa ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
Nathan Chancellor [Mon, 25 Jul 2022 23:36:29 +0000 (16:36 -0700)]
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()

commit 287d2fe4ec129c1a4215b079d877c5ff80e04bd7 upstream.

When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():

[    0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[    0.324928] Call trace:
[    0.324969]  __ubsan_handle_cfi_check_fail+0x58/0x60
[    0.325053]  __cfi_check_fail+0x3c/0x44
[    0.325120]  __cfi_slowpath_diag+0x178/0x200
[    0.325192]  drm_simple_kms_crtc_mode_valid+0x58/0x80
[    0.325279]  __drm_helper_update_and_validate+0x31c/0x464
...

The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.

Cc: stable@vger.kernel.org
Fixes: b7390008b818 ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz Paweł Gajc <tpgxyz@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-nathan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonouveau/svm: Fix to migrate all requested pages
Alistair Popple [Wed, 20 Jul 2022 06:27:45 +0000 (16:27 +1000)]
nouveau/svm: Fix to migrate all requested pages

commit 7fa0943b8b1dd2274c1dd0bb3392816b12a893e7 upstream.

Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.

Fix the calculation so that the entire range will get migrated if
possible.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes: f6e01c3642a3 ("drm/nouveau/svm: map pages after migration")
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
Cc: <stable@vger.kernel.org> # v5.8+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agos390/archrandom: prevent CPACF trng invocations in interrupt context
Harald Freudenberger [Wed, 13 Jul 2022 13:17:21 +0000 (15:17 +0200)]
s390/archrandom: prevent CPACF trng invocations in interrupt context

commit 43792a7e01e258a1522599c39fe4f6d953b5736f upstream.

This patch slightly reworks the s390 arch_get_random_seed_{int,long}
implementation: Make sure the CPACF trng instruction is never
called in any interrupt context. This is done by adding an
additional condition in_task().

Justification:

There are some constrains to satisfy for the invocation of the
arch_get_random_seed_{int,long}() functions:
- They should provide good random data during kernel initialization.
- They should not be called in interrupt context as the TRNG
  instruction is relatively heavy weight and may for example
  make some network loads cause to timeout and buck.

However, it was not clear what kind of interrupt context is exactly
encountered during kernel init or network traffic eventually calling
arch_get_random_seed_long().

After some days of investigations it is clear that the s390
start_kernel function is not running in any interrupt context and
so the trng is called:

Jul 11 18:33:39 t35lp54 kernel:  [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
Jul 11 18:33:39 t35lp54 kernel:  [<000000010715f246>] random_init+0xf6/0x238
Jul 11 18:33:39 t35lp54 kernel:  [<000000010712545c>] start_kernel+0x4a4/0x628
Jul 11 18:33:39 t35lp54 kernel:  [<000000010590402a>] startup_continue+0x2a/0x40

The condition in_task() is true and the CPACF trng provides random data
during kernel startup.

The network traffic however, is more difficult. A typical call stack
looks like this:

Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b560136>] crng_reseed+0x36/0xd8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5604b8>] crng_make_state+0x78/0x340
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b56108a>] get_random_u32+0xda/0x248
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611f98>] __alloc_skb+0x90/0x178
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f6526>] qeth_poll+0x256/0x3f8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63dbec>] net_rx_action+0x1cc/0x408
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b937302>] __do_softirq+0x132/0x6b0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf531a>] irq_exit_rcu+0x22/0x50
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b922506>] do_io_irq+0xe6/0x198
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935826>] io_int_handler+0xd6/0x110
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b9358a6>] psw_idle_exit+0x0/0xa
Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b933cfe>] default_idle_call+0x6e/0xd0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac59f4e>] do_idle+0xf6/0x1b0
Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
Jul 06 17:37:07 t35lp54 kernel:  [<000000008abb0d90>] smp_start_secondary+0x148/0x158
Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935b9e>] restart_int_handler+0x6e/0x90

which confirms that the call is in softirq context. So in_task() covers exactly
the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
not in soft irq but in normal task context and during kernel init.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com
Fixes: 3d09fce579fa ("s390/archrandom: simplify back to earlier design and initialize earlier")
[agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable]
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoasm-generic: remove a broken and needless ifdef conditional
Lukas Bulwahn [Fri, 22 Jul 2022 11:07:11 +0000 (13:07 +0200)]
asm-generic: remove a broken and needless ifdef conditional

commit 9a4f501d37ec66fb2ab097c4d3a6ba3ddb206d31 upstream.

Commit 5ea6e0ec50c1 ("lib: Add a generic version of devmem_is_allowed()")
introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then
falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB
in the reference) in ./include/asm-generic/io.h.

Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:

GENERIC_DEVMEM_IS_ALLOWED
Referencing files: include/asm-generic/io.h

The actual fix, though, is simply to not to make this function declaration
dependent on any kernel config. For architectures that intend to use
the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will
lead to picking the function definition, and for other architectures, this
function is simply defined elsewhere.

The wrong '#ifndef' on a non-existing config symbol also always had the
same effect (although more by mistake than by intent). So, there is no
functional change.

Remove this broken and needless ifdef conditional.

Fixes: 5ea6e0ec50c1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agohugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
Miaohe Lin [Sat, 9 Jul 2022 09:26:29 +0000 (17:26 +0800)]
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte

commit b1ac792fc18916a4cfe8dc540e5b1f990ca54b4d upstream.

When alloc_huge_page fails, *pagep is set to NULL without put_page first.
So the hugepage indicated by *pagep is leaked.

Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
Fixes: f741324156c6 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agomm: fix page leak with multiple threads mapping the same page
Josef Bacik [Tue, 5 Jul 2022 20:00:36 +0000 (16:00 -0400)]
mm: fix page leak with multiple threads mapping the same page

commit 17036d50aca9bf0525c4d551015d4856a4991632 upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size.  This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges.  This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped.  One thread maps
the PMD, the other thread loses the race and then returns 0.  However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page.  We
actually did the correct thing prior to 33fe6a60a178, however it looks
like Kirill copied what we do in the anonymous page case.  In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything.  Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[josef@toxicpanda.com: v2]
Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: 33fe6a60a178 ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agosecretmem: fix unhandled fault in truncate
Mike Rapoport [Thu, 7 Jul 2022 16:56:50 +0000 (19:56 +0300)]
secretmem: fix unhandled fault in truncate

commit ecf631530d3a6c14b6cc73405e3d39a2f053b94d upstream.

syzkaller reports the following issue:

BUG: unable to handle page fault for address: ffff888021f7e005
PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
Oops: 0002 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 zero_user_segments include/linux/highmem.h:272 [inline]
 folio_zero_range include/linux/highmem.h:428 [inline]
 truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
 truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
 truncate_inode_pages mm/truncate.c:452 [inline]
 truncate_pagecache+0x63/0x90 mm/truncate.c:753
 simple_setattr+0xed/0x110 fs/libfs.c:535
 secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
 notify_change+0xb8c/0x12b0 fs/attr.c:424
 do_truncate+0x13c/0x200 fs/open.c:65
 do_sys_ftruncate+0x536/0x730 fs/open.c:193
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb29d900899
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
 </TASK>
Modules linked in:
CR2: ffff888021f7e005
---[ end trace 0000000000000000 ]---

Eric Biggers suggested that this happens when
secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
a page that is faulted in by secretmem_fault() (and thus removed from the
direct map) is zeroed by inode truncation right afterwards.

Use mapping->invalidate_lock to make secretmem_fault() and
secretmem_setattr() mutually exclusive.

[rppt@linux.ibm.com: v3]
Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
Reported-by: syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agofs: sendfile handles O_NONBLOCK of out_fd
Andrei Vagin [Sun, 17 Jul 2022 04:37:10 +0000 (21:37 -0700)]
fs: sendfile handles O_NONBLOCK of out_fd

commit 7137be9e4fe77b772cbf80f721c9ad1e088e9402 upstream.

sendfile has to return EAGAIN if out_fd is nonblocking and the write into
it would block.

Here is a small reproducer for the problem:

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/sendfile.h>

#define FILE_SIZE (1UL << 30)
int main(int argc, char **argv) {
        int p[2], fd;

        if (pipe2(p, O_NONBLOCK))
                return 1;

        fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
        if (fd < 0)
                return 1;
        ftruncate(fd, FILE_SIZE);

        if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                fprintf(stderr, "FAIL\n");
        }
        if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                fprintf(stderr, "FAIL\n");
        }
        return 0;
}

It worked before cb2aed2dcb16, it is stuck after cb2aed2dcb16, and it
works again with this fix.

This regression occurred because do_splice_direct() calls pipe_write
that handles O_NONBLOCK.  Here is a trace log from the reproducer:

 1)               |  __x64_sys_sendfile64() {
 1)               |    do_sendfile() {
 1)               |      __fdget()
 1)               |      rw_verify_area()
 1)               |      __fdget()
 1)               |      rw_verify_area()
 1)               |      do_splice_direct() {
 1)               |        rw_verify_area()
 1)               |        splice_direct_to_actor() {
 1)               |          do_splice_to() {
 1)               |            rw_verify_area()
 1)               |            generic_file_splice_read()
 1) + 74.153 us   |          }
 1)               |          direct_splice_actor() {
 1)               |            iter_file_splice_write() {
 1)               |              __kmalloc()
 1)   0.148 us    |              pipe_lock();
 1)   0.153 us    |              splice_from_pipe_next.part.0();
 1)   0.162 us    |              page_cache_pipe_buf_confirm();
... 16 times
 1)   0.159 us    |              page_cache_pipe_buf_confirm();
 1)               |              vfs_iter_write() {
 1)               |                do_iter_write() {
 1)               |                  rw_verify_area()
 1)               |                  do_iter_readv_writev() {
 1)               |                    pipe_write() {
 1)               |                      mutex_lock()
 1)   0.153 us    |                      mutex_unlock();
 1)   1.368 us    |                    }
 1)   1.686 us    |                  }
 1)   5.798 us    |                }
 1)   6.084 us    |              }
 1)   0.174 us    |              kfree();
 1)   0.152 us    |              pipe_unlock();
 1) + 14.461 us   |            }
 1) + 14.783 us   |          }
 1)   0.164 us    |          page_cache_pipe_buf_release();
... 16 times
 1)   0.161 us    |          page_cache_pipe_buf_release();
 1)               |          touch_atime()
 1) + 95.854 us   |        }
 1) + 99.784 us   |      }
 1) ! 107.393 us  |    }
 1) ! 107.699 us  |  }

Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
Fixes: cb2aed2dcb16 ("teach sendfile(2) to handle send-to-pipe directly")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agontfs: fix use-after-free in ntfs_ucsncmp()
ChenXiaoSong [Thu, 7 Jul 2022 10:53:29 +0000 (18:53 +0800)]
ntfs: fix use-after-free in ntfs_ucsncmp()

commit a2f6479d2c63b54f4e0d14673b14255862cec64d upstream.

Syzkaller reported use-after-free bug as follows:

==================================================================
BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
Read of size 2 at addr ffff8880751acee8 by task a.out/879

CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1c0/0x2b0
 print_address_description.constprop.0.cold+0xd4/0x484
 print_report.cold+0x55/0x232
 kasan_report+0xbf/0xf0
 ntfs_ucsncmp+0x123/0x130
 ntfs_are_names_equal.cold+0x2b/0x41
 ntfs_attr_find+0x43b/0xb90
 ntfs_attr_lookup+0x16d/0x1e0
 ntfs_read_locked_attr_inode+0x4aa/0x2360
 ntfs_attr_iget+0x1af/0x220
 ntfs_read_locked_inode+0x246c/0x5120
 ntfs_iget+0x132/0x180
 load_system_files+0x1cc6/0x3480
 ntfs_fill_super+0xa66/0x1cf0
 mount_bdev+0x38d/0x460
 legacy_get_tree+0x10d/0x220
 vfs_get_tree+0x93/0x300
 do_new_mount+0x2da/0x6d0
 path_mount+0x496/0x19d0
 __x64_sys_mount+0x284/0x300
 do_syscall_64+0x3b/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f3f2118d9ea
Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

The buggy address belongs to the physical page:
page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
memcg:ffff888101f7e180
anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                                          ^
 ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
name string is out of bounds.

Fix this by adding sanity check on end address of attribute name string.

[akpm@linux-foundation.org: coding-style cleanups]
[chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "ocfs2: mount shared volume without ha stack"
Junxiao Bi [Fri, 3 Jun 2022 22:28:01 +0000 (15:28 -0700)]
Revert "ocfs2: mount shared volume without ha stack"

commit ff896a9942d56799a7cce1dedc4b4f0c5fbaed11 upstream.

This reverts commit dfe742023c9d02b7dc3403d43136794d796fe2b0.

This commit introduced a regression that can cause mount hung.  The
changes in __ocfs2_find_empty_slot causes that any node with none-zero
node number can grab the slot that was already taken by node 0, so node 1
will access the same journal with node 0, when it try to grab journal
cluster lock, it will hung because it was already acquired by node 0.
It's very easy to reproduce this, in one cluster, mount node 0 first, then
node 1, you will see the following call trace from node 1.

[13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
[13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
[13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
[13148.749354] Call Trace:
[13148.750718]  <TASK>
[13148.752019]  ? usleep_range+0x90/0x89
[13148.753882]  __schedule+0x210/0x567
[13148.755684]  schedule+0x44/0xa8
[13148.757270]  schedule_timeout+0x106/0x13c
[13148.759273]  ? __prepare_to_swait+0x53/0x78
[13148.761218]  __wait_for_common+0xae/0x163
[13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
[13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
[13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
[13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
[13148.775401]  ? iput+0x69/0xba
[13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
[13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
[13148.781756]  mount_bdev+0x190/0x1b7
[13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
[13148.785634]  legacy_get_tree+0x27/0x48
[13148.787466]  vfs_get_tree+0x25/0xd0
[13148.789270]  do_new_mount+0x18c/0x2d9
[13148.791046]  __x64_sys_mount+0x10e/0x142
[13148.792911]  do_syscall_64+0x3b/0x89
[13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
[13148.797051] RIP: 0033:0x7f2309f6e26e
[13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
[13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
[13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
[13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
[13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
[13148.816564]  </TASK>

To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
introduced the feature to mount ocfs2 locally even it is cluster based,
that is a very dangerous, it can easily cause serious data corruption,
there is no way to stop other nodes mounting the fs and corrupting it.
Setup ha or other cluster-aware stack is just the cost that we have to
take for avoiding corruption, otherwise we have to do it in kernel.

Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
Fixes: dfe742023c9d("ocfs2: mount shared volume without ha stack")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Luiz Augusto von Dentz [Thu, 21 Jul 2022 16:10:50 +0000 (09:10 -0700)]
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put

commit 5c507b737a50a8bc0933e2447db9f625899cabef upstream.

This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

  refcount_t: increment on 0; use-after-free.
  BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
  Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

  CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
  4.14.234-00003-g1fb6d0bd49a4-dirty #28
  Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
  Google Inc. MSM sm8150 Flame DVT (DT)
  Workqueue: hci0 hci_rx_work
  Call trace:
   dump_backtrace+0x0/0x378
   show_stack+0x20/0x2c
   dump_stack+0x124/0x148
   print_address_description+0x80/0x2e8
   __kasan_report+0x168/0x188
   kasan_report+0x10/0x18
   __asan_load4+0x84/0x8c
   refcount_dec_and_test+0x20/0xd0
   l2cap_chan_put+0x48/0x12c
   l2cap_recv_frame+0x4770/0x6550
   l2cap_recv_acldata+0x44c/0x7a4
   hci_acldata_packet+0x100/0x188
   hci_rx_work+0x178/0x23c
   process_one_work+0x35c/0x95c
   worker_thread+0x4cc/0x960
   kthread+0x1a8/0x1c4
   ret_from_fork+0x10/0x18

Cc: stable@kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoLinux 5.15.58
Greg Kroah-Hartman [Fri, 29 Jul 2022 15:25:34 +0000 (17:25 +0200)]
Linux 5.15.58

Link: https://lore.kernel.org/r/20220727161026.977588183@linuxfoundation.org
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220728133327.660846209@linuxfoundation.org
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Fix wrong format specifier in amdgpu_dm.c
Hayden Goodfellow [Mon, 13 Sep 2021 01:32:09 +0000 (21:32 -0400)]
drm/amd/display: Fix wrong format specifier in amdgpu_dm.c

commit c089f90591716d3e83d07dcdff3d9f293c80b2d0 upstream.

[Why]
Currently, the 32bit kernel build fails due to an incorrect string
format specifier. ARRAY_SIZE() returns size_t type as it uses sizeof().
However, we specify it in a string as %ld. This causes a compiler error
and causes the 32bit build to fail.

[How]
Change the %ld to %zu as size_t (which sizeof() returns) is an unsigned
integer data type. We use 'z' to ensure it also works with 64bit build.

Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hayden Goodfellow <Hayden.Goodfellow@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/entry_32: Fix segment exceptions
Peter Zijlstra [Tue, 11 Jan 2022 11:11:14 +0000 (12:11 +0100)]
x86/entry_32: Fix segment exceptions

commit 44970aa89e4263d439537770598b3b2bf17dc6f8 upstream.

The LKP robot reported that commit in Fixes: caused a failure. Turns out
the ldt_gdt_32 selftest turns into an infinite loop trying to clear the
segment.

As discovered by Sean, what happens is that PARANOID_EXIT_TO_KERNEL_MODE
in the handle_exception_return path overwrites the entry stack data with
the task stack data, restoring the "bad" segment value.

Instead of having the exception retry the instruction, have it emulate
the full instruction. Replace EX_TYPE_POP_ZERO with EX_TYPE_POP_REG
which will do the equivalent of: POP %reg; MOV $imm, %reg.

In order to encode the segment registers, add them as registers 8-11 for
32-bit.

By setting regs->[defg]s the (nested) RESTORE_REGS will pop this value
at the end of the exception handler and by increasing regs->sp, it will
have skipped the stack slot.

This was debugged by Sean Christopherson <seanjc@google.com>.

 [ bp: Add EX_REG_GS too. ]

Fixes: a83e73a18cda ("x86/entry_32: Remove .fixup usage")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/Yd1l0gInc4zRcnt/@hirez.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()
Dan Carpenter [Mon, 16 May 2022 07:05:48 +0000 (10:05 +0300)]
drm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()

commit d8a357549b217847aaf1e9b56871497c46b39440 upstream.

The > ARRAY_SIZE() should be >= ARRAY_SIZE() to prevent an out of bounds
access.

Fixes: 5c4f2ed1ef60 ("drm/amd/display: Support for DMUB HPD interrupt handling")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
Jan Beulich [Tue, 7 Jun 2022 15:00:53 +0000 (17:00 +0200)]
x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()

commit 3acd1e2908eccf3142f274c1ddc05a85de2d5293 upstream.

As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
and clobbering of "cc" don't work well together. The compiler appears to
mean to reject such, but doesn't - in its upstream form - quite manage
to yet for "cc". Furthermore two similar macros don't clobber "cc", and
clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
always assumes status flags to be clobbered there.

Fixes: 3c75cc27a67d ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoKVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness
Maxim Levitsky [Thu, 12 May 2022 10:14:20 +0000 (13:14 +0300)]
KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness

commit eeb056d210dbb5c70ad6182943fd8165081aca80 upstream.

This shows up as a TDP MMU leak when running nested.  Non-working cmpxchg on L0
relies makes L1 install two different shadow pages under same spte, and one of
them is leaked.

Fixes: 557b4e681631b ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220512101420.306759-1-mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/extable: Prefer local labels in .set directives
Nick Desaulniers [Tue, 29 Mar 2022 20:21:45 +0000 (13:21 -0700)]
x86/extable: Prefer local labels in .set directives

commit f2bf3e9d041eb10bdafb31b37fc2776fc4a90d51 upstream.

Bernardo reported an error that Nathan bisected down to
(x86_64) defconfig+LTO_CLANG_FULL+X86_PMEM_LEGACY.

    LTO     vmlinux.o
  ld.lld: error: <instantiation>:1:13: redefinition of 'found'
  .set found, 0
              ^

  <inline asm>:29:1: while in macro instantiation
  extable_type_reg reg=%eax, type=(17 | ((0) << 16))
  ^

This appears to be another LTO specific issue similar to what was folded
into commit f8966b33a3a9 ("x86/extable: Extend extable functionality"),
where the `.set found, 0` in DEFINE_EXTABLE_TYPE_REG in
arch/x86/include/asm/asm.h conflicts with the symbol for the static
function `found` in arch/x86/kernel/pmem.c.

Assembler .set directive declare symbols with global visibility, so the
assembler may not rename such symbols in the event of a conflict. LTO
could rename static functions if there was a conflict in C sources, but
it cannot see into symbols defined in inline asm.

The symbols are also retained in the symbol table, regardless of LTO.

Give the symbols .L prefixes making them locally visible, so that they
may be renamed for LTO to avoid conflicts, and to drop them from the
symbol table regardless of LTO.

Fixes: f8966b33a3a9 ("x86/extable: Extend extable functionality")
Reported-by: Bernardo Meurer Costa <beme@google.com>
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220329202148.2379697-1-ndesaulniers@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: invalid parameter check in dmub_hpd_callback
José Expósito [Sun, 9 Jan 2022 18:42:45 +0000 (19:42 +0100)]
drm/amd/display: invalid parameter check in dmub_hpd_callback

commit 40b318d067f55ae66e0871a4f744f3c57f244cdc upstream.

The function performs a check on the "adev" input parameter, however, it
is used before the check.

Initialize the "dev" variable after the sanity check to avoid a possible
NULL pointer dereference.

Fixes: 5c4f2ed1ef609 ("drm/amd/display: Support for DMUB HPD interrupt handling")
Addresses-Coverity-ID: 1493909 ("Null pointer dereference")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Don't lock connection_mutex for DMUB HPD
Nicholas Kazlauskas [Thu, 4 Nov 2021 20:52:06 +0000 (16:52 -0400)]
drm/amd/display: Don't lock connection_mutex for DMUB HPD

commit ed4d2bb7b0a8b567e356a743ce32fbc16bbbc0ce upstream.

[Why]
Per DRM spec we only need to hold that lock when touching
connector->state - which we do not do in that handler.

Taking this locking introduces unnecessary dependencies with other
threads which is bad for performance and opens up the potential for
a deadlock since there are multiple locks being held at once.

[How]
Remove the connection_mutex lock/unlock routine and just iterate over
the drm connectors normally. The iter helpers implicitly lock the
connection list so this is safe to do.

DC link access also does not need to be guarded since the link
table is static at creation - we don't dynamically add or remove links,
just streams.

Fixes: 5c4f2ed1ef60 ("drm/amd/display: Support for DMUB HPD interrupt handling")
Reviewed-by: Jude Shih <shenshih@amd.com>
Acked-by: Anson Jacob <Anson.Jacob@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agowatch-queue: remove spurious double semicolon
Linus Torvalds [Thu, 21 Jul 2022 17:30:14 +0000 (10:30 -0700)]
watch-queue: remove spurious double semicolon

commit 8afd05ad9da179654179cf6ad11cbacdb0d31964 upstream.

Sedat Dilek noticed that I had an extraneous semicolon at the end of a
line in the previous patch.

It's harmless, but unintentional, and while compilers just treat it as
an extra empty statement, for all I know some other tooling might warn
about it. So clean it up before other people notice too ;)

Fixes: e58a2c354f8b ("watchqueue: make sure to serialize 'wqueue->defunct' properly")
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet: usb: ax88179_178a needs FLAG_SEND_ZLP
Jose Alonso [Mon, 13 Jun 2022 18:32:44 +0000 (15:32 -0300)]
net: usb: ax88179_178a needs FLAG_SEND_ZLP

commit 25855cda31d6def26952aebba6d4f96c285e9d62 upstream.

The extra byte inserted by usbnet.c when
 (length % dev->maxpacket == 0) is causing problems to device.

This patch sets FLAG_SEND_ZLP to avoid this.

Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet

Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
   "message authentication code incorrect"
   This happens because the tcp message sent is corrupted during the
   USB "Bulk out". The device calculate the tcp checksum and send a
   valid tcp message to the remote sshd. Then the encryption detects
   the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
   The "Bulk in" continue receiving packets normally.
   The host sends "Bulk out" and the device responds with -ECONNRESET.
   (The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.

A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================

Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
Jiri Slaby [Thu, 7 Jul 2022 08:25:58 +0000 (10:25 +0200)]
tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()

commit 5796b7564c773698503d01d3089f99bd4b0abdcd upstream.

There is a race in pty_write(). pty_write() can be called in parallel
with e.g. ioctl(TIOCSTI) or ioctl(TCXONC) which also inserts chars to
the buffer. Provided, tty_flip_buffer_push() in pty_write() is called
outside the lock, it can commit inconsistent tail. This can lead to out
of bounds writes and other issues. See the Link below.

To fix this, we have to introduce a new helper called
tty_insert_flip_string_and_push_buffer(). It does both
tty_insert_flip_string() and tty_flip_buffer_commit() under the port
lock. It also calls queue_work(), but outside the lock. See
21b6b371430c (pty: do tty_flip_buffer_push without port->lock in
pty_write) for the reasons.

Keep the helper internal-only (in drivers' tty.h). It is not intended to
be used widely.

Link: https://seclists.org/oss-sec/2022/q2/155
Fixes: 21b6b371430c (pty: do tty_flip_buffer_push without port->lock in pty_write)
Cc: 一只狗 <chennbnbnb@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20220707082558.9250-2-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
Jiri Slaby [Thu, 7 Jul 2022 08:25:57 +0000 (10:25 +0200)]
tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()

commit 0b16a53c573489c8ee9962fd84f07a9f7156eab6 upstream.

We will need this new helper in the next patch.

Cc: Hillf Danton <hdanton@sina.com>
Cc: 一只狗 <chennbnbnb@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20220707082558.9250-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotty: drop tty_schedule_flip()
Jiri Slaby [Mon, 22 Nov 2021 11:16:48 +0000 (12:16 +0100)]
tty: drop tty_schedule_flip()

commit 8ad7f4cff55e68077d638a3386b71c699a572b06 upstream.

Since commit 0aa8fd8ce11e7 (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). All
users were converted in the previous patches, so remove
tty_schedule_flip() completely while inlining its body into
tty_flip_buffer_push().

One less exported function.

Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20211122111648.30379-4-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotty: the rest, stop using tty_schedule_flip()
Jiri Slaby [Mon, 22 Nov 2021 11:16:47 +0000 (12:16 +0100)]
tty: the rest, stop using tty_schedule_flip()

commit 3060e700d7032a4abb3735dc41f527f6b0d15bbb upstream.

Since commit 0aa8fd8ce11e7 (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). We are
going to remove the latter (as it is used less), so call the former in
the rest of the users.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: William Hubbs <w.d.hubbs@gmail.com>
Cc: Chris Brannon <chris@the-brannons.com>
Cc: Kirk Reiser <kirk@reisers.ca>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20211122111648.30379-3-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotty: drivers/tty/, stop using tty_schedule_flip()
Jiri Slaby [Mon, 22 Nov 2021 11:16:46 +0000 (12:16 +0100)]
tty: drivers/tty/, stop using tty_schedule_flip()

commit 3651b4818639972bcd80d2ff141d27121d534b3b upstream.

Since commit 0aa8fd8ce11e7 (tty: Fix low_latency BUG) in 2014,
tty_flip_buffer_push() is only a wrapper to tty_schedule_flip(). We are
going to remove the latter (as it is used less), so call the former in
drivers/tty/.

Cc: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20211122111648.30379-2-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agowatchqueue: make sure to serialize 'wqueue->defunct' properly
Linus Torvalds [Tue, 19 Jul 2022 18:09:01 +0000 (11:09 -0700)]
watchqueue: make sure to serialize 'wqueue->defunct' properly

commit e58a2c354f8bf5c53065c5eb14846408c9d43dfd upstream.

When the pipe is closed, we mark the associated watchqueue defunct by
calling watch_queue_clear().  However, while that is protected by the
watchqueue lock, new watchqueue entries aren't actually added under that
lock at all: they use the pipe->rd_wait.lock instead, and looking up
that pipe happens without any locking.

The watchqueue code uses the RCU read-side section to make sure that the
wqueue entry itself hasn't disappeared, but that does not protect the
pipe_info in any way.

So make sure to actually hold the wqueue lock when posting watch events,
properly serializing against the pipe being torn down.

Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/alternative: Report missing return thunk details
Kees Cook [Wed, 13 Jul 2022 21:38:19 +0000 (14:38 -0700)]
x86/alternative: Report missing return thunk details

commit 27d13ec1e2cb4a2d2316a30860e4505376d15a03 upstream.

Debugging missing return thunks is easier if we can see where they're
happening.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/amd: Use IBPB for firmware calls
Peter Zijlstra [Mon, 18 Jul 2022 11:41:37 +0000 (13:41 +0200)]
x86/amd: Use IBPB for firmware calls

commit e3abb3ac20c026872da1bea6bab37dcd334fde1e upstream.

On AMD IBRS does not prevent Retbleed; as such use IBPB before a
firmware call to flush the branch history state.

And because in order to do an EFI call, the kernel maps a whole lot of
the kernel page table into the EFI page table, do an IBPB just in case
in order to prevent the scenario of poisoning the BTB and causing an EFI
call using the unprotected RET there.

  [ bp: Massage. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Fix surface optimization regression on Carrizo
Nicholas Kazlauskas [Tue, 12 Oct 2021 14:04:03 +0000 (10:04 -0400)]
drm/amd/display: Fix surface optimization regression on Carrizo

commit 3d3f372da5f84311db8926a6cb1321d6d423a84d upstream.

[Why]
DCE legacy optimization path isn't well tested under new DC optimization
flow which can result in underflow occuring when initializing X11 on
Carrizo.

[How]
Retain the legacy optimization flow for DCE and keep the new one for DCN
to satisfy optimizations being correctly applied for ASIC that can
support it.

Fixes: a6011e22856c60 ("drm/amd/display: Optimize bandwidth on following fast update")
Reported-by: Tom St Denis <tom.stdenis@amd.com>
Tested-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Optimize bandwidth on following fast update
Nicholas Kazlauskas [Thu, 19 Aug 2021 20:36:09 +0000 (16:36 -0400)]
drm/amd/display: Optimize bandwidth on following fast update

commit a6011e22856c60695450bccc89a40a6b1dd7206e upstream.

[Why]
The current call to optimize_bandwidth never occurs because flip is
always pending from the FULL and FAST updates.

[How]
Optimize on the following flip when it's a FAST update and we know we
aren't going to be modifying the clocks again.

Reviewed-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com>
Acked-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Reset DMCUB before HW init
Nicholas Kazlauskas [Fri, 10 Dec 2021 23:04:05 +0000 (15:04 -0800)]
drm/amd/display: Reset DMCUB before HW init

commit 301e6095afc56f199c0723696f7b92b3a0ceb446 upstream.

[Why]
If the firmware wasn't reset by PSP or HW and is currently running
then the firmware will hang or perform underfined behavior when we
modify its firmware state underneath it.

[How]
Reset DMCUB before setting up cache windows and performing HW init.

Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoexfat: use updated exfat_chain directly during renaming
Sungjong Seo [Tue, 7 Jun 2022 15:05:21 +0000 (00:05 +0900)]
exfat: use updated exfat_chain directly during renaming

commit e63a87fa5be336fa69e7bcb33e3c9ea25f4cc91f upstream.

In order for a file to access its own directory entry set,
exfat_inode_info(ei) has two copied values. One is ei->dir, which is
a snapshot of exfat_chain of the parent directory, and the other is
ei->entry, which is the offset of the start of the directory entry set
in the parent directory.

Since the parent directory can be updated after the snapshot point,
it should be used only for accessing one's own directory entry set.

However, as of now, during renaming, it could try to traverse or to
allocate clusters via snapshot values, it does not make sense.

This potential problem has been revealed when exfat_update_parent_info()
was removed by commit e7ebf0fa07c1 ("exfat: fix referencing wrong parent
directory information after renaming"). However, I don't think it's good
idea to bring exfat_update_parent_info() back.

Instead, let's use the updated exfat_chain of parent directory diectly.

Fixes: e7ebf0fa07c1 ("exfat: fix referencing wrong parent directory information after renaming")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Luiz Augusto von Dentz [Tue, 15 Feb 2022 01:59:38 +0000 (17:59 -0800)]
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks

commit d3422715c1ea69459d0dba7cfa352dd3975a9025 upstream.

Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
shall return the partial chunks it could allocate instead of freeing
everything as otherwise it can cause problems like bellow.

Fixes: 11fa3ef31d3c ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: SCO: Fix sco_send_frame returning skb->len
Luiz Augusto von Dentz [Thu, 16 Sep 2021 20:10:49 +0000 (13:10 -0700)]
Bluetooth: SCO: Fix sco_send_frame returning skb->len

commit b8fe61a692b741c68ef1888a26481fc9fb6f3025 upstream.

The skb in modified by hci_send_sco which pushes SCO headers thus
changing skb->len causing sco_sock_sendmsg to fail.

Fixes: 99a849f60da6 ("Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg")
Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: Fix passing NULL to PTR_ERR
Luiz Augusto von Dentz [Thu, 16 Sep 2021 20:10:48 +0000 (13:10 -0700)]
Bluetooth: Fix passing NULL to PTR_ERR

commit fb9b1cfeb4d439f24729b7eb8bdd2a2197ea0ac9 upstream.

Passing NULL to PTR_ERR will result in 0 (success), also since the likes of
bt_skb_sendmsg does never return NULL it is safe to replace the instances of
IS_ERR_OR_NULL with IS_ERR when checking its return.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg
Luiz Augusto von Dentz [Fri, 3 Sep 2021 22:27:32 +0000 (15:27 -0700)]
Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg

commit 11fa3ef31d3c7151200433fc9caefc1d0d79b113 upstream.

This makes use of bt_skb_sendmmsg instead using memcpy_from_msg which
is not considered safe to be used when lock_sock is held.

Also make rfcomm_dlc_send handle skb with fragments and queue them all
atomically.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg
Luiz Augusto von Dentz [Fri, 3 Sep 2021 22:27:31 +0000 (15:27 -0700)]
Bluetooth: SCO: Replace use of memcpy_from_msg with bt_skb_sendmsg

commit 99a849f60da6f6a89c8dae472c0bcc4b21ca9b91 upstream.

This makes use of bt_skb_sendmsg instead of allocating a different
buffer to be used with memcpy_from_msg which cause one extra copy.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: Add bt_skb_sendmmsg helper
Luiz Augusto von Dentz [Fri, 3 Sep 2021 22:27:30 +0000 (15:27 -0700)]
Bluetooth: Add bt_skb_sendmmsg helper

commit 123b83e2dea4301a3d06a8589333432ddee34b38 upstream.

This works similarly to bt_skb_sendmsg but can split the msg into
multiple skb fragments which is useful for stream sockets.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoBluetooth: Add bt_skb_sendmsg helper
Luiz Augusto von Dentz [Fri, 3 Sep 2021 22:27:29 +0000 (15:27 -0700)]
Bluetooth: Add bt_skb_sendmsg helper

commit 1b696bf223cdab8b76a67e8a3568d2f0acb51b77 upstream.

bt_skb_sendmsg helps takes care of allocation the skb and copying the
the contents of msg over to the skb while checking for possible errors
so it should be safe to call it without holding lock_sock.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoum: virtio_uml: Fix broken device handling in time-travel
Johannes Berg [Tue, 17 May 2022 20:52:50 +0000 (22:52 +0200)]
um: virtio_uml: Fix broken device handling in time-travel

[ Upstream commit 771c89cc218093f6e5cb462d0831e38a74f778ba ]

If a device implementation crashes, virtio_uml will mark it
as dead by calling virtio_break_device() and scheduling the
work that will remove it.

This still seems like the right thing to do, but it's done
directly while reading the message, and if time-travel is
used, this is in the time-travel handler, outside of the
normal Linux machinery. Therefore, we cannot acquire locks
or do normal "linux-y" things because e.g. lockdep will be
confused about the context.

Move handling this situation out of the read function and
into the actual IRQ handler and response handling instead,
so that in the case of time-travel we don't call it in the
wrong context.

Chances are the system will still crash immediately, since
the device implementation crashing may also cause the time-
travel controller to go down, but at least all of that now
happens without strange warnings from lockdep.

Fixes: d60af75a979e ("um: time-travel: rework interrupt handling in ext mode")
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoum: virtio_uml: Allow probing from devicetree
Vincent Whitchurch [Tue, 21 Dec 2021 09:04:46 +0000 (10:04 +0100)]
um: virtio_uml: Allow probing from devicetree

[ Upstream commit 6cf4a96818b1b1af037a7489232ab5126c41cd6c ]

Allow the virtio_uml device to be probed from the devicetree so that
sub-devices can be specified using the standard virtio bindings, for
example:

  virtio@1 {
    compatible = "virtio,uml";
    socket-path = "i2c.sock";
    virtio-device-id = <0x22>;

    i2c-controller {
      compatible = "virtio,device22";
      #address-cells = <0x01>;
      #size-cells = <0x00>;

      light-sensor@01 {
        compatible = "ti,opt3001";
        reg = <0x01>;
      };
    };
  };

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotracing: Fix return value of trace_pid_write()
Wonhyuk Yang [Tue, 3 May 2022 05:05:46 +0000 (14:05 +0900)]
tracing: Fix return value of trace_pid_write()

[ Upstream commit 1aabec7328e019b542742f02dd2aedd2121cc7cf ]

Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 4
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    write(1, "\n", 1)                       = 0
    ....

This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.

The result of applied patch is below.

    $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
    execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
    ...
    write(1, "123 \n", 5)                   = 5
    close(1)                                = 0

Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Baik Song An <bsahn@etri.re.kr>
Cc: Hong Yeon Kim <kimhy@etri.re.kr>
Cc: Taeung Song <taeung@reallinux.co.kr>
Cc: linuxgeek@linuxgeek.io
Cc: stable@vger.kernel.org
Fixes: 4b20a42270595 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotracing: Place trace_pid_list logic into abstract functions
Steven Rostedt (VMware) [Fri, 24 Sep 2021 01:03:49 +0000 (21:03 -0400)]
tracing: Place trace_pid_list logic into abstract functions

[ Upstream commit ed6799d3b08a748f55079edb7b92c723b06397ee ]

Instead of having the logic that does trace_pid_list open coded, wrap it in
abstract functions. This will allow a rewrite of the logic that implements
the trace_pid_list without affecting the users.

Note, this causes a change in behavior. Every time a pid is written into
the set_*_pid file, it creates a new list and uses RCU to update it. If
pid_max is lowered, but there was a pid currently in the list that was
higher than pid_max, those pids will now be removed on updating the list.
The old behavior kept that from happening.

The rewrite of the pid_list logic will no longer depend on pid_max,
and will return the old behavior.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agotracing: Have event format check not flag %p* on __get_dynamic_array()
Steven Rostedt (Google) [Thu, 7 Apr 2022 18:56:32 +0000 (14:56 -0400)]
tracing: Have event format check not flag %p* on __get_dynamic_array()

[ Upstream commit cd644a2f0bdd5c80c235522d269cf9eb5e998fb8 ]

The print fmt check against trace events to make sure that the format does
not use pointers that may be freed from the time of the trace to the time
the event is read, gives a false positive on %pISpc when reading data that
was saved in __get_dynamic_array() when it is perfectly fine to do so, as
the data being read is on the ring buffer.

Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/
Cc: stable@vger.kernel.org
Fixes: 9957643b19751 ("tracing: Add check of trace event print fmts for dereferencing pointers")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoexfat: fix referencing wrong parent directory information after renaming
Yuezhang Mo [Mon, 4 Apr 2022 02:58:06 +0000 (11:58 +0900)]
exfat: fix referencing wrong parent directory information after renaming

[ Upstream commit e7ebf0fa07c1ffb17925e9912d68594b69e20a7c ]

During renaming, the parent directory information maybe
updated. But the file/directory still references to the
old parent directory information.

This bug will cause 2 problems.

(1) The renamed file can not be written.

    [10768.175172] exFAT-fs (sda1): error, failed to bmap (inode : 7afd50e4 iblock : 0, err : -5)
    [10768.184285] exFAT-fs (sda1): Filesystem has been set read-only
    ash: write error: Input/output error

(2) Some dentries of the renamed file/directory are not set
    to deleted after removing the file/directory.

exfat_update_parent_info() is a workaround for the wrong parent
directory information being used after renaming. Now that bug is
fixed, this is no longer needed, so remove it.

Fixes: 3ba778edcf48 ("exfat: add inode operations")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - re-enable registration of algorithms
Giovanni Cabiddu [Mon, 9 May 2022 13:34:17 +0000 (14:34 +0100)]
crypto: qat - re-enable registration of algorithms

[ Upstream commit 590555227ecc5178abd71e416ed135ff21624625 ]

Re-enable the registration of algorithms after fixes to (1) use
pre-allocated buffers in the datapath and (2) support the
CRYPTO_TFM_REQ_MAY_BACKLOG flag.

This reverts commit b0261670ad9b616678800b4f033812fe9e8c3535.

Cc: stable@vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - add param check for DH
Giovanni Cabiddu [Mon, 9 May 2022 13:34:15 +0000 (14:34 +0100)]
crypto: qat - add param check for DH

[ Upstream commit 5dff47e18a5e65bcc78c64788275f5d3495f528c ]

Reject requests with a source buffer that is bigger than the size of the
key. This is to prevent a possible integer underflow that might happen
when copying the source scatterlist into a linear buffer.

Cc: stable@vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - add param check for RSA
Giovanni Cabiddu [Mon, 9 May 2022 13:34:14 +0000 (14:34 +0100)]
crypto: qat - add param check for RSA

[ Upstream commit c0c6b9c68e54c01d9cdb85878963ad1f3bb22b41 ]

Reject requests with a source buffer that is bigger than the size of the
key. This is to prevent a possible integer underflow that might happen
when copying the source scatterlist into a linear buffer.

Cc: stable@vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - remove dma_free_coherent() for DH
Giovanni Cabiddu [Mon, 9 May 2022 13:34:13 +0000 (14:34 +0100)]
crypto: qat - remove dma_free_coherent() for DH

[ Upstream commit 1219b8bcc69ac3c04a68f88ddddd1d7eb756569c ]

The functions qat_dh_compute_value() allocates memory with
dma_alloc_coherent() if the source or the destination buffers are made
of multiple flat buffers or of a size that is not compatible with the
hardware.
This memory is then freed with dma_free_coherent() in the context of a
tasklet invoked to handle the response for the corresponding request.

According to Documentation/core-api/dma-api-howto.rst, the function
dma_free_coherent() cannot be called in an interrupt context.

Replace allocations with dma_alloc_coherent() in the function
qat_dh_compute_value() with kmalloc() + dma_map_single().

Cc: stable@vger.kernel.org
Fixes: f113873185ae ("crypto: qat - Add DH support")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - remove dma_free_coherent() for RSA
Giovanni Cabiddu [Mon, 9 May 2022 13:34:12 +0000 (14:34 +0100)]
crypto: qat - remove dma_free_coherent() for RSA

[ Upstream commit 8b41f34c6ad58a77f79c1386529a37170513fdeb ]

After commit 240442c7424b ("dma-mapping: remove CONFIG_DMA_REMAP"), if
the algorithms are enabled, the driver crashes with a BUG_ON while
executing vunmap() in the context of a tasklet. This is due to the fact
that the function dma_free_coherent() cannot be called in an interrupt
context (see Documentation/core-api/dma-api-howto.rst).

The functions qat_rsa_enc() and qat_rsa_dec() allocate memory with
dma_alloc_coherent() if the source or the destination buffers are made
of multiple flat buffers or of a size that is not compatible with the
hardware.
This memory is then freed with dma_free_coherent() in the context of a
tasklet invoked to handle the response for the corresponding request.

Replace allocations with dma_alloc_coherent() in the functions
qat_rsa_enc() and qat_rsa_dec() with kmalloc() + dma_map_single().

Cc: stable@vger.kernel.org
Fixes: d4d2781efa4d ("crypto: qat - Add support for RSA algorithm")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - fix memory leak in RSA
Giovanni Cabiddu [Mon, 9 May 2022 13:34:11 +0000 (14:34 +0100)]
crypto: qat - fix memory leak in RSA

[ Upstream commit ac851d93c24ef0cc961611110a3f88fd1b6a0cb5 ]

When an RSA key represented in form 2 (as defined in PKCS #1 V2.1) is
used, some components of the private key persist even after the TFM is
released.
Replace the explicit calls to free the buffers in qat_rsa_exit_tfm()
with a call to qat_rsa_clear_ctx() which frees all buffers referenced in
the TFM context.

Cc: stable@vger.kernel.org
Fixes: a4e08d2529de ("crypto: qat - Add RSA CRT mode")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - add backlog mechanism
Giovanni Cabiddu [Mon, 9 May 2022 13:34:10 +0000 (14:34 +0100)]
crypto: qat - add backlog mechanism

[ Upstream commit 318a3828e1e48d5c3581778185074b489acde4fe ]

The implementations of the crypto algorithms (aead, skcipher, etc) in
the QAT driver do not properly support requests with the
CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver
returns -EBUSY but does not enqueue the request. This can result in
applications like dm-crypt waiting indefinitely for the completion of a
request that was never submitted to the hardware.

Fix this by adding a software backlog queue: if the ring buffer is more
than eighty percent full, then the request is enqueued to a backlog
list and the error code -EBUSY is returned back to the caller.
Requests in the backlog queue are resubmitted at a later time, in the
context of the callback of a previously submitted request.
The request for which -EBUSY is returned is then marked as -EINPROGRESS
once submitted to the HW queues.

The submission loop inside the function qat_alg_send_message() has been
modified to decide which submission policy to use based on the request
flags. If the request does not have the CRYPTO_TFM_REQ_MAY_BACKLOG set,
the previous behaviour has been preserved.

Based on a patch by
Vishnu Das Ramachandran <vishnu.dasx.ramachandran@intel.com>

Cc: stable@vger.kernel.org
Fixes: 469a759ea0f6 ("crypto: qat - Intel(R) QAT crypto interface")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Kyle Sanderson <kyle.leet@gmail.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - refactor submission logic
Giovanni Cabiddu [Mon, 9 May 2022 13:34:09 +0000 (14:34 +0100)]
crypto: qat - refactor submission logic

[ Upstream commit c372d0aefd8e7d00dfd5d256da993bb443e9ce1c ]

All the algorithms in qat_algs.c and qat_asym_algs.c use the same
pattern to submit messages to the HW queues. Move the submission loop
to a new function, qat_alg_send_message(), and share it between the
symmetric and the asymmetric algorithms.

As part of this rework, since the number of retries before returning an
error is inconsistent between the symmetric and asymmetric
implementations, set it to a value that works for both (i.e. 20, was 10
in qat_algs.c and 100 in qat_asym_algs.c)

In addition fix the return code reported when the HW queues are full.
In that case return -ENOSPC instead of -EBUSY.

Including stable in CC since (1) the error code returned if the HW queues
are full is incorrect and (2) to facilitate the backport of the next fix
"crypto: qat - add backlog mechanism".

Cc: stable@vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - use pre-allocated buffers in datapath
Giovanni Cabiddu [Mon, 9 May 2022 13:34:08 +0000 (14:34 +0100)]
crypto: qat - use pre-allocated buffers in datapath

[ Upstream commit 9b78bb19d2e04d22586208b7e0fffd6d366eb867 ]

In order to do DMAs, the QAT device requires that the scatterlist
structures are mapped and translated into a format that the firmware can
understand. This is defined as the composition of a scatter gather list
(SGL) descriptor header, the struct qat_alg_buf_list, plus a variable
number of flat buffer descriptors, the struct qat_alg_buf.

The allocation and mapping of these data structures is done each time a
request is received from the skcipher and aead APIs.
In an OOM situation, this behaviour might lead to a dead-lock if an
allocation fails.

Based on the conversation in [1], increase the size of the aead and
skcipher request contexts to include an SGL descriptor that can handle
a maximum of 4 flat buffers.
If requests exceed 4 entries buffers, memory is allocated dynamically.

[1] https://lore.kernel.org/linux-crypto/20200722072932.GA27544@gondor.apana.org.au/

Cc: stable@vger.kernel.org
Fixes: 469a759ea0f6 ("crypto: qat - Intel(R) QAT crypto interface")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agocrypto: qat - set to zero DH parameters before free
Giovanni Cabiddu [Mon, 9 May 2022 13:19:27 +0000 (14:19 +0100)]
crypto: qat - set to zero DH parameters before free

[ Upstream commit feb3060169ea4a3480ca222abed137e40685314b ]

Set to zero the context buffers containing the DH key before they are
freed.
This is a defense in depth measure that avoids keys to be recovered from
memory in case the system is compromised between the free of the buffer
and when that area of memory (containing keys) gets overwritten.

Cc: stable@vger.kernel.org
Fixes: f113873185ae ("crypto: qat - Add DH support")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoiwlwifi: fw: uefi: add missing include guards
Johannes Berg [Sun, 24 Oct 2021 15:20:26 +0000 (18:20 +0300)]
iwlwifi: fw: uefi: add missing include guards

[ Upstream commit eb130a615b618fe88db551bbf675125639a6312d ]

We still don't use #pragma once in the kernel, but even if
we did it'd be missing. Add the missing include guards.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: bf7c8c1a2cbf ("iwlwifi: move UEFI code to a separate file")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20211024181719.7fc9988ed49b.I87e300fab664047581e51fb9b02744c75320d08c@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agomt76: fix use-after-free by removing a non-RCU wcid pointer
Felix Fietkau [Fri, 25 Mar 2022 21:01:43 +0000 (22:01 +0100)]
mt76: fix use-after-free by removing a non-RCU wcid pointer

[ Upstream commit 10a705e7cee75657cbe5a7ef47d0da8856e32ba9 ]

Fixes an issue caught by KASAN about use-after-free in mt76_txq_schedule
by protecting mtxq->wcid with rcu_lock between mt76_txq_schedule and
sta_info_[alloc, free].

[18853.876689] ==================================================================
[18853.876751] BUG: KASAN: use-after-free in mt76_txq_schedule+0x204/0xaf8 [mt76]
[18853.876773] Read of size 8 at addr ffffffaf989a2138 by task mt76-tx phy0/883
[18853.876786]
[18853.876810] CPU: 5 PID: 883 Comm: mt76-tx phy0 Not tainted 5.10.100-fix-510-56778d365941-kasan #5 0b01fbbcf41a530f52043508fec2e31a4215

[18853.876840] Call trace:
[18853.876861]  dump_backtrace+0x0/0x3ec
[18853.876878]  show_stack+0x20/0x2c
[18853.876899]  dump_stack+0x11c/0x1ac
[18853.876918]  print_address_description+0x74/0x514
[18853.876934]  kasan_report+0x134/0x174
[18853.876948]  __asan_report_load8_noabort+0x44/0x50
[18853.876976]  mt76_txq_schedule+0x204/0xaf8 [mt76 074e03e4640e97fe7405ee1fab547b81c4fa45d2]
[18853.877002]  mt76_txq_schedule_all+0x2c/0x48 [mt76 074e03e4640e97fe7405ee1fab547b81c4fa45d2]
[18853.877030]  mt7921_tx_worker+0xa0/0x1cc [mt7921_common f0875ebac9d7b4754e1010549e7db50fbd90a047]
[18853.877054]  __mt76_worker_fn+0x190/0x22c [mt76 074e03e4640e97fe7405ee1fab547b81c4fa45d2]
[18853.877071]  kthread+0x2f8/0x3b8
[18853.877087]  ret_from_fork+0x10/0x30
[18853.877098]
[18853.877112] Allocated by task 941:
[18853.877131]  kasan_save_stack+0x38/0x68
[18853.877147]  __kasan_kmalloc+0xd4/0xfc
[18853.877163]  kasan_kmalloc+0x10/0x1c
[18853.877177]  __kmalloc+0x264/0x3c4
[18853.877294]  sta_info_alloc+0x460/0xf88 [mac80211]
[18853.877410]  ieee80211_prep_connection+0x204/0x1ee0 [mac80211]
[18853.877523]  ieee80211_mgd_auth+0x6c4/0xa4c [mac80211]
[18853.877635]  ieee80211_auth+0x20/0x2c [mac80211]
[18853.877733]  rdev_auth+0x7c/0x438 [cfg80211]
[18853.877826]  cfg80211_mlme_auth+0x26c/0x390 [cfg80211]
[18853.877919]  nl80211_authenticate+0x6d4/0x904 [cfg80211]
[18853.877938]  genl_rcv_msg+0x748/0x93c
[18853.877954]  netlink_rcv_skb+0x160/0x2a8
[18853.877969]  genl_rcv+0x3c/0x54
[18853.877985]  netlink_unicast_kernel+0x104/0x1ec
[18853.877999]  netlink_unicast+0x178/0x268
[18853.878015]  netlink_sendmsg+0x3cc/0x5f0
[18853.878030]  sock_sendmsg+0xb4/0xd8
[18853.878043]  ____sys_sendmsg+0x2f8/0x53c
[18853.878058]  ___sys_sendmsg+0xe8/0x150
[18853.878071]  __sys_sendmsg+0xc4/0x1f4
[18853.878087]  __arm64_compat_sys_sendmsg+0x88/0x9c
[18853.878101]  el0_svc_common+0x1b4/0x390
[18853.878115]  do_el0_svc_compat+0x8c/0xdc
[18853.878131]  el0_svc_compat+0x10/0x1c
[18853.878146]  el0_sync_compat_handler+0xa8/0xcc
[18853.878161]  el0_sync_compat+0x188/0x1c0
[18853.878171]
[18853.878183] Freed by task 10927:
[18853.878200]  kasan_save_stack+0x38/0x68
[18853.878215]  kasan_set_track+0x28/0x3c
[18853.878228]  kasan_set_free_info+0x24/0x48
[18853.878244]  __kasan_slab_free+0x11c/0x154
[18853.878259]  kasan_slab_free+0x14/0x24
[18853.878273]  slab_free_freelist_hook+0xac/0x1b0
[18853.878287]  kfree+0x104/0x390
[18853.878402]  sta_info_free+0x198/0x210 [mac80211]
[18853.878515]  __sta_info_destroy_part2+0x230/0x2d4 [mac80211]
[18853.878628]  __sta_info_flush+0x300/0x37c [mac80211]
[18853.878740]  ieee80211_set_disassoc+0x2cc/0xa7c [mac80211]
[18853.878851]  ieee80211_mgd_deauth+0x4a4/0x10a0 [mac80211]
[18853.878962]  ieee80211_deauth+0x20/0x2c [mac80211]
[18853.879057]  rdev_deauth+0x7c/0x438 [cfg80211]
[18853.879150]  cfg80211_mlme_deauth+0x274/0x414 [cfg80211]
[18853.879243]  cfg80211_mlme_down+0xe4/0x118 [cfg80211]
[18853.879335]  cfg80211_disconnect+0x218/0x2d8 [cfg80211]
[18853.879427]  __cfg80211_leave+0x17c/0x240 [cfg80211]
[18853.879519]  cfg80211_leave+0x3c/0x58 [cfg80211]
[18853.879611]  wiphy_suspend+0xdc/0x200 [cfg80211]
[18853.879628]  dpm_run_callback+0x58/0x408
[18853.879642]  __device_suspend+0x4cc/0x864
[18853.879658]  async_suspend+0x34/0xf4
[18853.879673]  async_run_entry_fn+0xe0/0x37c
[18853.879689]  process_one_work+0x508/0xb98
[18853.879702]  worker_thread+0x7f4/0xcd4
[18853.879717]  kthread+0x2f8/0x3b8
[18853.879731]  ret_from_fork+0x10/0x30
[18853.879741]
[18853.879757] The buggy address belongs to the object at ffffffaf989a2000
[18853.879757]  which belongs to the cache kmalloc-8k of size 8192
[18853.879774] The buggy address is located 312 bytes inside of
[18853.879774]  8192-byte region [ffffffaf989a2000ffffffaf989a4000)
[18853.879787] The buggy address belongs to the page:
[18853.879807] page:000000004bda2a59 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d89a0
[18853.879823] head:000000004bda2a59 order:3 compound_mapcount:0 compound_pincount:0
[18853.879839] flags: 0x8000000000010200(slab|head)
[18853.879857] raw: 8000000000010200 ffffffffbc89e208 ffffffffb7fb5208 ffffffaec000cc80
[18853.879873] raw: 0000000000000000 0000000000010001 00000001ffffffff 0000000000000000
[18853.879885] page dumped because: kasan: bad access detected
[18853.879896]
[18853.879907] Memory state around the buggy address:
[18853.879922]  ffffffaf989a2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[18853.879935]  ffffffaf989a2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[18853.879948] >ffffffaf989a2100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[18853.879961]                                         ^
[18853.879973]  ffffffaf989a2180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[18853.879986]  ffffffaf989a2200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[18853.879998] ==================================================================

Cc: stable@vger.kernel.org
Reported-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoxhci: Set HCD flag to defer primary roothub registration
Kishon Vijay Abraham I [Tue, 10 May 2022 09:16:30 +0000 (14:46 +0530)]
xhci: Set HCD flag to defer primary roothub registration

[ Upstream commit 7dd69d730ce2735e50fba949740aefb1b7b2957f ]

Set "HCD_FLAG_DEFER_RH_REGISTER" to hcd->flags in xhci_run() to defer
registering primary roothub in usb_add_hcd() if xhci has two roothubs.
This will make sure both primary roothub and secondary roothub will be
registered along with the second HCD.
This is required for cold plugged USB devices to be detected in certain
PCIe USB cards (like Inateck USB card connected to AM64 EVM or J7200 EVM).

This patch has been added and reverted earier as it triggered a race
in usb device enumeration.
That race is now fixed in 5.16-rc3, and in stable back to 5.4
commit 6ae44db5b96f ("usb: hub: Fix locking issues with address0_mutex")
commit 74decaa58c7a ("usb: hub: Fix usb enumeration issue due to address0
race")

[minor rebase change, and commit message update -Mathias]

CC: stable@vger.kernel.org # 5.4+
Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Tested-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Link: https://lore.kernel.org/r/20220510091630.16564-3-kishon@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoxhci: dbc: Rename xhci_dbc_init and xhci_dbc_exit
Mathias Nyman [Wed, 16 Feb 2022 09:51:47 +0000 (11:51 +0200)]
xhci: dbc: Rename xhci_dbc_init and xhci_dbc_exit

[ Upstream commit 0e5552974ccf90c23076829085694167ebce7120 ]

These names give the impression the functions are related to
module init calls, but are in fact creating and removing the dbc
fake device

Rename them to xhci_create_dbc_dev() and xhci_remove_dbc_dev().

We will need the _init and _exit names for actual dbc module init
and exit calls.

No functional changes

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220216095153.1303105-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoxhci: dbc: create and remove dbc structure in dbgtty driver.
Mathias Nyman [Wed, 16 Feb 2022 09:51:46 +0000 (11:51 +0200)]
xhci: dbc: create and remove dbc structure in dbgtty driver.

[ Upstream commit e0c233be367377704c2fd0cb39a08c74a9d8e058 ]

Turn the dbgtty closer to a device driver by allocating the dbc
structure in its own xhci_dbc_tty_probe() function, and freeing it
in xhci_dbc_tty_remove()

Remove xhci_do_dbc_exit() as its no longer needed.

allocate and create the dbc strcuture in xhci_dbc_tty_probe()

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220216095153.1303105-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoxhci: dbc: refactor xhci_dbc_init()
Mathias Nyman [Wed, 16 Feb 2022 09:51:45 +0000 (11:51 +0200)]
xhci: dbc: refactor xhci_dbc_init()

[ Upstream commit 7bc8c44a928dff329b1796c549ccbaf8a56738fd ]

Refactor xhci_dbc_init(), splitting it into logical
parts closer to the Linux device model.

- Create the fake dbc device, depends on xhci strucure
- Allocate a dbc structure, xhci agnostic
- Call xhci_dbc_tty_probe(), similar to actual probe.

Adjustments to xhci_dbc_exit and xhci_dbc_remove are also needed
as a result to the xhci_dbc_init() changes

Mostly non-functional changes, except for creating the dbc sysfs
entry earlier, together with the dbc structure.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20220216095153.1303105-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agoKVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses
Sean Christopherson [Wed, 2 Feb 2022 00:49:44 +0000 (00:49 +0000)]
KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses

[ Upstream commit 557b4e681631bc05b90d38be5f78dc4715ae2290 ]

Use the recently introduce __try_cmpxchg_user() to emulate atomic guest
accesses via the associated userspace address instead of mapping the
backing pfn into kernel address space.  Using kvm_vcpu_map() is unsafe as
it does not coordinate with KVM's mmu_notifier to ensure the hva=>pfn
translation isn't changed/unmapped in the memremap() path, i.e. when
there's no struct page and thus no elevated refcount.

Fixes: 27c238573dcc ("KVM/X86: Use kvm_vcpu_map in emulator_cmpxchg_emulated")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220202004945.2540433-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/futex: Remove .fixup usage
Peter Zijlstra [Wed, 10 Nov 2021 10:01:11 +0000 (11:01 +0100)]
x86/futex: Remove .fixup usage

[ Upstream commit 2d3315f5807343fcffe86d3bf8d55a311282444a ]

Use the new EX_TYPE_IMM_REG to store -EFAULT into the designated 'ret'
register, this removes the need for anonymous .fixup code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.426016322@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/msr: Remove .fixup usage
Peter Zijlstra [Wed, 10 Nov 2021 10:01:10 +0000 (11:01 +0100)]
x86/msr: Remove .fixup usage

[ Upstream commit c161c9d2e8151f161f23ae4198172d8393fed874 ]

Rework the MSR accessors to remove .fixup usage. Add two new extable
types (to the 4 already existing msr ones) using the new register
infrastructure to record which register should get the error value.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.364084212@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/extable: Extend extable functionality
Peter Zijlstra [Wed, 10 Nov 2021 10:01:09 +0000 (11:01 +0100)]
x86/extable: Extend extable functionality

[ Upstream commit f8966b33a3a9a135d5916f21d4d0f8b1beeb189d ]

In order to remove further .fixup usage, extend the extable
infrastructure to take additional information from the extable entry
sites.

Specifically add _ASM_EXTABLE_TYPE_REG() and EX_TYPE_IMM_REG that
extend the existing _ASM_EXTABLE_TYPE() by taking an additional
register argument and encoding that and an s16 immediate into the
existing s32 type field. This limits the actual types to the first
byte, 255 seem plenty.

Also add a few flags into the type word, specifically CLEAR_AX and
CLEAR_DX which clear the return and extended return register.

Notes:
 - due to the % in our register names it's hard to make it more
   generally usable as arm64 did.
 - the s16 is far larger than used in these patches, future extentions
   can easily shrink this to get more bits.
 - without the bitfield fix this will not compile, because: 0xFF > -1
   and we can't even extract the TYPE field.

[nathanchance: Build fix for clang-lto builds:
 https://lkml.kernel.org/r/20211210234953.3420108-1-nathan@kernel.org
]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211110101325.303890153@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/entry_32: Remove .fixup usage
Peter Zijlstra [Wed, 10 Nov 2021 10:01:08 +0000 (11:01 +0100)]
x86/entry_32: Remove .fixup usage

[ Upstream commit a83e73a18cdaf0d85af97d877098fc852b7bcb6d ]

Where possible, push the .fixup into code, at the tail of functions.

This is hard for macros since they're used in multiple functions,
therefore introduce a new extable handler to pop zeros.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101325.245184699@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agobitfield.h: Fix "type of reg too small for mask" test
Peter Zijlstra [Wed, 10 Nov 2021 10:01:03 +0000 (11:01 +0100)]
bitfield.h: Fix "type of reg too small for mask" test

[ Upstream commit f57d58311e4a7f43fc4fa5d32f1f6d19e037c3f4 ]

The test: 'mask > (typeof(_reg))~0ull' only works correctly when both
sides are unsigned, consider:

 - 0xff000000 vs (int)~0ull
 - 0x000000ff vs (int)~0ull

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20211110101324.950210584@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE
Thomas Gleixner [Wed, 8 Sep 2021 13:29:19 +0000 (15:29 +0200)]
x86/extable: Provide EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE

[ Upstream commit 2cfb6385557d6bebc37493e9f9295b3cfeca06be ]

Provide exception fixup types which can be used to identify fixups which
allow in kernel #MC recovery and make them invoke the existing handlers.

These will be used at places where #MC recovery is handled correctly by the
caller.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210908132525.269689153@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
3 years agox86/extable: Rework the exception table mechanics
Thomas Gleixner [Wed, 8 Sep 2021 13:29:18 +0000 (15:29 +0200)]
x86/extable: Rework the exception table mechanics

[ Upstream commit 33f0f7024d625f94877db3be9a91bd39c769197c ]

The exception table entries contain the instruction address, the fixup
address and the handler address. All addresses are relative. Storing the
handler address has a few downsides:

 1) Most handlers need to be exported

 2) Handlers can be defined everywhere and there is no overview about the
    handler types

 3) MCE needs to check the handler type to decide whether an in kernel #MC
    can be recovered. The functionality of the handler itself is not in any
    way special, but for these checks there need to be separate functions
    which in the worst case have to be exported.

    Some of these 'recoverable' exception fixups are pretty obscure and
    just reuse some other handler to spare code. That obfuscates e.g. the
    #MC safe copy functions. Cleaning that up would require more handlers
    and exports

Rework the exception fixup mechanics by storing a fixup type number instead
of the handler address and invoke the proper handler for each fixup
type. Also teach the extable sort to leave the type field alone.

This makes most handlers static except for special cases like the MCE
MSR fixup and the BPF fixup. This allows to add more types for cleaning up
the obscure places without adding more handler code and exports.

There is a marginal code size reduction for a production config and it
removes _eight_ exported symbols.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>