The recent patch to make afs_getattr consult the server didn't account
for the pseudo-inodes employed by the dynamic root-type afs superblock
not having a volume or a server to access, and thus an oops occurs if
such a directory is stat'd.
Fix this by checking to see if the vnode->volume pointer actually points
anywhere before following it in afs_getattr().
This can be tested by stat'ing a directory in /afs. It may be
sufficient just to do "ls /afs" and the oops looks something like:
commit ab4fd97bb851 ("[PATCH] mips: update IRQ handling for vr41xx") added
a function irq_dispatch, and it'll increase irq_err_count when the get_irq
callback returns a negative value, but increase irq_err_count in get_irq
was not removed.
And also, modpost complains once gpio-vr41xx drivers become modules.
ERROR: modpost: "irq_err_count" [drivers/gpio/gpio-vr41xx.ko] undefined!
So it would be a good idea to remove repetitive increase irq_err_count in
get_irq callback.
Fixes: cc5be02f60c6 ("MIPS: Update VR41xx GPIO driver to use gpiolib") Fixes: ab4fd97bb851 ("[PATCH] mips: update IRQ handling for vr41xx") Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: huhai <huhai@kylinos.cn> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by
tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock->sk_prot->close
is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.
=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
<TASK>
tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x18/0x20 net/socket.c:1365
Updating a proto which is same with sock->sk_prot is incorrect. Add proto
and sock->sk_prot equality check at the head of tls_update() to fix it.
Fixes: 8e0fd2340797 ("bpf: sockmap/tls, close can race with map free") Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 0c5e1bb756ca ("erspan: auto detect truncated ipv6 packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
of_graph_get_remote_node() returns remote device node pointer with
refcount incremented, we should use of_node_put() on it
when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 6bbe0e33a90b ("drm: convert drivers to use of_graph_get_remote_node") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/488473/ Link: https://lore.kernel.org/r/20220607110841.53889-1-linmq006@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
qopt.latency is __u32, psched_tdiff_t is signed long,
(psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
qopt.latency is always UINT_MAX.
Fix it by using psched_time_t (u64) instead.
Note: confusingly, users have two ways to specify 'latency':
1. normally, via '__u32 latency' in struct tc_netem_qopt;
2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
For the second case, theoretically 'latency' could be negative. This
patch ignores that corner case, since it is broken (i.e. assigning a
negative s64 to __u32) anyways, and should be handled separately.
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: ab0a4d932d7e ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor") Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/ Tested-by: Jonathan Toppins <jtoppins@redhat.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/9400.1655407960@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Cc: Ondrej Spacek <ondrej.spacek@nxp.com> Fixes: a9894f007b7e1 ("net: phy: aquantia: add support for auto-negotiation configuration") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
A customer reported a request_socket leak in a Calico cloud environment. We
found that a BPF program was doing a socket lookup with takes a refcnt on
the socket and that it was finding the request_socket but returning the parent
LISTEN socket via sk_to_full_sk() without decrementing the child request socket
1st, resulting in request_sock slab object leak. This patch retains the
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.
Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
thanks to Antoine Tenart for the reproducer and patch input.
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
validate RCU flags on the listen socket so that it balances with bpf_sk_release()
and update comments as per Martin KaFai Lau's suggestion. One small change to
Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
instruction.
Fixes: c0ef450c4e84 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Fixes: aa1f46455534 ("bpf: add skc_lookup_tcp helper") Co-developed-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
RM500K provides 2 mandatory interfaces to Linux host after enumeration.
- /dev/ttyUSB5: this is a serial interface for control path. User needs
to write AT commands to this device node to query status, set APN,
set PIN code, and enable/disable the data connection to 5G network.
- ethX: this is the data path provided as a RNDIS devices. After the
data connection has been established, Linux host can access 5G data
network via this interface.
"RNDIS": RNDIS + ADB + AT (/dev/ttyUSB5) + MODEM COMs
The EM05-G modem has 2 USB configurations that are configurable via the AT
command AT+QCFG="usbnet",[ 0 | 2 ] which make the modem enumerate with
the following interfaces, respectively:
random.c ratelimits how much it warns about uninitialized urandom reads
using __ratelimit(). When the RNG is finally initialized, it prints the
number of missed messages due to ratelimiting.
It has been this way since that functionality was introduced back in
2018. Recently, 13cd7980484c ("random: remove ratelimiting for in-kernel
unseeded randomness") put a bit more stress on the urandom ratelimiting,
which teased out a bug in the implementation.
Specifically, when under pressure, __ratelimit() will print its own
message and reset the count back to 0, making the final message at the
end less useful. Secondly, it does so as a pr_warn(), which apparently
is undesirable for people's CI.
Fortunately, __ratelimit() has the RATELIMIT_MSG_ON_RELEASE flag exactly
for this purpose, so we set the flag.
Fixes: a0596dd97db5 ("random: rate limit unseeded randomness warnings") Cc: stable@vger.kernel.org Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Ron Economos <re@w6rz.net> Tested-by: Ron Economos <re@w6rz.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f569a3ade581 ("dm mirror log: round up region bitmap size to
BITS_PER_LONG") introduced a regression on 64-bit architectures in the
lvm testsuite tests: lvcreate-mirror, mirror-names and vgsplit-operation.
If the device is shrunk, we need to clear log bits beyond the end of the
device. The code clears bits up to a 32-bit boundary and then calculates
lc->sync_count by summing set bits up to a 64-bit boundary (the commit
changed that; previously, this boundary was 32-bit too). So, it was using
some non-zeroed bits in the calculation and this caused misbehavior.
Fix this regression by clearing bits up to BITS_PER_LONG boundary.
Fixes: f569a3ade581 ("dm mirror log: round up region bitmap size to BITS_PER_LONG") Cc: stable@vger.kernel.org Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1. Archives the current era
2. Commits the metadata, as part of the RPC call for archiving the
current era
3. Stops the worker
Until the worker stops, it might write to the metadata again. Moreover,
these writes are not flushed to disk immediately, but are cached by the
dm-bufio client, which writes them back asynchronously.
As a result, the committed metadata of a suspended dm-era device might
not be consistent with the in-core metadata.
In some cases, this can result in the corruption of the on-disk
metadata. Suppose the following sequence of events:
1. Load a new table, e.g. a snapshot-origin table, to a device with a
dm-era table
2. Suspend the device
3. dm-era commits its metadata, but the worker does a few more metadata
writes until it stops, as part of digesting an archived writeset
4. These writes are cached by the dm-bufio client
5. Load the dm-era table to another device.
6. The new instance of the dm-era target loads the committed, on-disk
metadata, which don't include the extra writes done by the worker
after the metadata commit.
7. Resume the new device
8. The new dm-era target instance starts using the metadata
9. Resume the original device
10. The destructor of the old dm-era target instance is called and
destroys the dm-bufio client, which results in flushing the cached
writes to disk
11. These writes might overwrite the writes done by the new dm-era
instance, hence corrupting its metadata.
Fix this by committing the metadata after the worker stops running.
stop_worker uses flush_workqueue to flush the current work. However, the
work item may re-queue itself and flush_workqueue doesn't wait for
re-queued works to finish.
This could result in the worker changing the metadata after they have
been committed, or writing to the metadata concurrently with the commit
in the postsuspend thread.
Use drain_workqueue instead, which waits until the work and all
re-queued works finish.
The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
| Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
| mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
So instead of multiplying the value in cycles with 4096, we have to
divide it by that value. Use DIV_ROUND_UP to make sure we are on the
safe side, especially when the calculated value in cycles is smaller
than 4096 as typically the case.
This bug likely never triggered because any timeout != 0 usually will
do. In my case the busy timeout in cycles was originally calculated as
2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
taken for the 16 bit wide register field, so the register value was
0x8000. With 607663bc3b1d ("mtd: rawnand: gpmi: fix controller timings
setting") however the value in cycles became 2384, which multiplied
with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
intermediate timeout when reading from NAND.
Fixes: 16c1b23db3788 ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220614083138.3455683-1-s.hauer@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a packet enters the OVS datapath and does not match any existing
flows installed in the kernel flow cache, the packet will be sent to
userspace to be parsed, and a new flow will be created. The kernel and
OVS rely on each other to parse packet fields in the same way so that
packets will be handled properly.
As per the design document linked below, OVS expects all later IPv6
fragments to have nw_proto=44 in the flow key, so they can be correctly
matched on OpenFlow rules. OpenFlow controllers create pipelines based
on this design.
This behavior was changed by the commit in the Fixes tag so that
nw_proto equals the next_header field of the last extension header.
However, there is no counterpart for this change in OVS userspace,
meaning that this field is parsed differently between OVS and the
kernel. This is a problem because OVS creates actions based on what is
parsed in userspace, but the kernel-provided flow key is used as a match
criteria, as described in Documentation/networking/openvswitch.rst. This
leads to issues such as packets incorrectly matching on a flow and thus
the wrong list of actions being applied to the packet. Such changes in
packet parsing cannot be implemented without breaking the userspace.
The offending commit is partially reverted to restore the expected
behavior.
The change technically made sense and there is a good reason that it was
implemented, but it does not comply with the original design of OVS.
If in the future someone wants to implement such a change, then it must
be user-configurable and disabled by default to preserve backwards
compatibility with existing OVS versions.
There is not have Headset Mic verb table in BIOS default.
So, it will have recording issue from headset MIC.
Add the verb table value without jack detect. It will turn on Headset Mic.
Currently the Conexant codec driver sets up the beep NID after calling
snd_hda_gen_parse_auto_config(). It turned out that this results in
the insufficient setup for the beep control, as the generic parser
handles the fake path in snd_hda_gen_parse_auto_config() only if the
beep_nid is set up beforehand.
For dealing with the beep widget properly, call cx_auto_parse_beep()
before snd_hda_gen_parse_auto_config() call.
Like the previous fix for Conexant codec, the beep_nid has to be set
up before calling snd_hda_gen_parse_auto_config(); otherwise it'd miss
the path setup.
Fix the call order for addressing the missing beep setup.
It used to be that mix_interrupt_randomness() would credit 1 bit each
time it ran, and so add_interrupt_randomness() would schedule mix() to
run every 64 interrupts, a fairly arbitrary number, but nonetheless
considered to be a decent enough conservative estimate.
Since 4f280103bfeb ("random: do not use input pool from hard IRQs"),
mix() is now able to credit multiple bits, depending on the number of
calls to add(). This was done for reasons separate from this commit, but
it has the nice side effect of enabling this patch to schedule mix()
less often.
Currently the rules are:
a) Credit 1 bit for every 64 calls to add().
b) Schedule mix() once a second that add() is called.
c) Schedule mix() once every 64 calls to add().
Rules (a) and (c) no longer need to be coupled. It's still important to
have _some_ value in (c), so that we don't "over-saturate" the fast
pool, but the once per second we get from rule (b) is a plenty enough
baseline. So, by increasing the 64 in rule (c) to something larger, we
avoid calling queue_work_on() as frequently during irq storms.
This commit changes that 64 in rule (c) to be 1024, which means we
schedule mix() 16 times less often. And it does *not* need to change the
64 in rule (a).
Fixes: b4d32c9bcc75 ("random: defer fast pool mixing to worker") Cc: stable@vger.kernel.org Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drop support for these ioctls:
* PIO_FONT, PIO_FONTX
* GIO_FONT, GIO_FONTX
* PIO_FONTRESET
As was demonstrated by commit 89f558ba837c (tty: make FONTX ioctl use
the tty pointer they were actually passed), these ioctls are not used
from userspace, as:
1) they used to be broken (set up font on current console, not the open
one) and racy (before the commit above)
2) KDFONTOP ioctl is used for years instead
Note that PIO_FONTRESET is defunct on most systems as VGA_CONSOLE is set
on them for ages. That turns on BROKEN_GRAPHICS_PROGRAMS which makes
PIO_FONTRESET just return an error.
We are removing KD_FONT_FLAG_OLD here as it was used only by these
removed ioctls. kd.h header exists both in kernel and uapi headers, so
we can remove the kernel one completely. Everyone includeing kd.h will
now automatically get the uapi one.
There are now unused definitions of the ioctl numbers and "struct
consolefontdesc" in kd.h, but as it is a uapi header, I am not touching
these.
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers
When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:
(1) If the DMA transfer does not write to every byte in the buffer,
then the unwritten bytes will contain stale data once the transfer
has completed.
(2) If the buffer has a virtual alias in userspace, then stale data
may be visible via this alias during the period between performing
the cache invalidation and the DMA writes landing in memory.
Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.
Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 57c2c1517858 ("tcp: change source port randomizarion at
connect() time"), the table_perturb[] array was introduced and an
index was taken from the port_offset via hash_32(). But it turns
out that hash_32() performs a multiplication while the input here
comes from the output of SipHash in secure_seq, that is well
distributed enough to avoid the need for yet another hash.
Suggested-by: Amit Klein <aksecurity@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.
Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.
A live test at 1 million connections per second showed no performance
difference from the previous value.
Reported-by: Moshe Kol <moshe.kol@mail.huji.ac.il> Reported-by: Yossi Gilad <yossi.gilad@mail.huji.ac.il> Reported-by: Amit Klein <aksecurity@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il> Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il> Cc: Amit Klein <aksecurity@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.
With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il> Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il> Cc: Amit Klein <aksecurity@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.
Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Moshe Kol <moshe.kol@mail.huji.ac.il> Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il> Cc: Amit Klein <aksecurity@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.
Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a candidate in the first
attempt.
This noise should not significantly reduce the collision
avoidance, and should be zero if connection table
is already well used.
Note that this is not implementing RFC 6056 3.3.5
because we think Algorithm 5 could hurt typical
workloads.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Dworken <ddworken@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In systemd systems setting a fixed MAC address through
the "dev_addr" module argument fails systematically.
When checking the MAC address after the interface is created
it always has the same but different MAC address to the one
supplied as argument.
This is partially caused by systemd which by default will
set an internally generated permanent MAC address for interfaces
that are marked as having a randomly generated address.
Commit 38008d64ea50659b ("usb: gadget: u_ether: fix race in
setting MAC address in setup phase") didn't take into account
the fact that the interface must be marked as having a set
MAC address when it's set as module argument.
Fixed by marking the interface with NET_ADDR_SET when
the "dev_addr" module argument is supplied.
Fixes: 38008d64ea50659b ("usb: gadget: u_ether: fix race in setting MAC address in setup phase") Cc: stable@vger.kernel.org Signed-off-by: Marian Postevca <posteuca@mutex.one> Link: https://lore.kernel.org/r/20220603153459.32722-1-posteuca@mutex.one Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The switch to a keyed guest does not require a classic sske as the other
guest CPUs are not accessing the key before the switch is complete.
By using the NQ SSKE things are faster especially with multiple guests.
The existing __lshrti3 was really inefficient, and the other two helpers
are also needed to compile some modules.
Add the missing versions, and export all of the symbols like arm64
already does.
This code is based on the assembly generated by libgcc builds.
This fixes a build break triggered by ubsan:
riscv64-unknown-linux-gnu-ld: lib/ubsan.o: in function `.L2':
ubsan.c:(.text.unlikely+0x38): undefined reference to `__ashlti3'
riscv64-unknown-linux-gnu-ld: ubsan.c:(.text.unlikely+0x42): undefined reference to `__ashrti3'
Signed-off-by: Olof Johansson <olof@lixom.net>
[paul.walmsley@sifive.com: use SYM_FUNC_{START,END} instead of
ENTRY/ENDPROC; note libgcc origin] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
riscv's <vdso/processor.h> uses barrier() so it should include
<asm/barrier.h>
Fixes this build error:
CC [M] drivers/net/ethernet/emulex/benet/be_main.o
In file included from ./include/vdso/processor.h:10,
from ./arch/riscv/include/asm/processor.h:11,
from ./include/linux/prefetch.h:15,
from drivers/net/ethernet/emulex/benet/be_main.c:14:
./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax':
./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration]
14 | barrier();
This happens with a total of 5 networking drivers -- they all use
<linux/prefetch.h>.
rv64 allmodconfig now builds cleanly after this patch.
Fixes fallout from: bf82e747dd2a ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Fixes: 64b54cd2683d ("riscv: use vDSO common flow to reduce the latency of the time-related functions") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
[sudip: change in old path] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 437a17c0a7a6 ("kprobes: Init kprobes in early_initcall") enabled
using kprobes from early_initcall. Unfortunately at this point the
hardware debug infrastructure is not operational. The OS lock may still
be locked, and the hardware watchpoints may have unknown values when
kprobe enables debug monitors to single-step instructions.
Rather than using hardware single-step, append a BRK instruction after
the instruction to be executed out-of-line.
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new
actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.
After the first ct() action the packet headers are almost fully
re-written. The next ct() tries to re-use the existing nfct entry
and marks the packet as invalid, so it gets dropped later in the
pipeline.
Clearing the cached conntrack entry whenever packet tuple is changed
to avoid the issue.
The flow key should not be cleared though, because we should still
be able to match on the ct_state if the recirculation happens after
the tuple change but before the next ct() action.
in current Linux, MTU policing does not take into account that packets at
the TC ingress have the L2 header pulled. Thus, the same TC police action
(with the same value of tcfp_mtu) behaves differently for ingress/egress.
In addition, the full GSO size is compared to tcfp_mtu: as a consequence,
the policer drops GSO packets even when individual segments have the L2 +
L3 + L4 + payload length below the configured valued of tcfp_mtu.
Improve the accuracy of MTU policing as follows:
- account for mac_len for non-GSO packets at TC ingress.
- compare MTU threshold with the segmented size for GSO packets.
Also, add a kselftest that verifies the correct behavior.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[dcaratti: fix conflicts due to lack of the following commits:
- commit 56e588b63b77 ("net/sched: act_police: add support for
packet-per-second policing")
- commit 11915b314ef6 ("selftests: forwarding: Add tc-police tests")
- commit 9badfde1d429 ("selftests: forwarding: Add tc-police tests for
packets per second")] Link: https://lore.kernel.org/netdev/876d597a0ff55f6ba786f73c5a9fd9eb8d597a03.1644514748.git.dcaratti@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 12 enhanced -Waddress when comparing array address to null [0],
which warns:
drivers/virtio/virtio_pci_common.c: In function ‘vp_del_vqs’:
drivers/virtio/virtio_pci_common.c:257:29: warning: the comparison will always evaluate as ‘true’ for the pointer operand in ‘vp_dev->msix_affinity_masks + (sizetype)((long unsigned int)i * 256)’ must not be NULL [-Waddress]
257 | if (vp_dev->msix_affinity_masks[i])
| ^~~~~~
In fact, the verification is comparing the result of a pointer
arithmetic, the address "msix_affinity_masks + i", which will always
evaluate to true.
Under the hood, free_cpumask_var() calls kfree(), which is safe to pass
NULL, not requiring non-null verification. So remove the verification
to make compiler happy (happy compiler, happy life).
The HP EliteBook 630 is using ALC236 codec which used 0x02 to control mute LED
and 0x01 to control micmute LED. Therefore, add a quirk to make it works.
The HP EliteBook 840 G8 Notebook PC is using ALC236 codec which is
using 0x02 to control mute LED and 0x01 to control micmute LED.
Therefore, add a quirk to make it works.
We capture a NULL pointer issue when resizing a corrupt ext4 image which
is freshly clear resize_inode feature (not run e2fsck). It could be
simply reproduced by following steps. The problem is because of the
resize_inode feature was cleared, and it will convert the filesystem to
meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
not reduced to zero, so could we mistakenly call reserve_backup_gdb()
and passing an uninitialized resize_inode to it when adding new group
descriptors.
mkfs.ext4 /dev/sda 3G
tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
mount /dev/sda /mnt
resize2fs /dev/sda 8G
The fix is simple, add a check in ext4_resize_begin() to make sure that
the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
disabled.
Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
be a signed integer so we can correctly check for an error code returned
by dx_make_map().
we can easily reproduce this problem with the following commands:
`fallocate -l100M disk`
`mkfs.ext4 -b 1024 -g 256 disk`
`mount disk /mnt`
`fsstress -d /mnt -l 0 -n 1000 -p 1`
The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.
Cc: stable@kernel.org Fixes: 8d80954c43e8 ("ext4: trim allocation requests to group size") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The code in dm-log rounds up bitset_size to 32 bits. It then uses
find_next_zero_bit_le on the allocated region. find_next_zero_bit_le
accesses the bitmap using unsigned long pointers. So, on 64-bit
architectures, it may access 4 bytes beyond the allocated size.
Fix this bug by rounding up bitset_size to BITS_PER_LONG.
This bug was found by running the lvm2 testsuite with kasan.
Not all LSR register flags are preserved across reads. Therefore, LSR
readers must store the non-preserved bits into lsr_save_flags.
This fix was initially mixed into feature commit b0d3cb1b30f4 ("serial:
8250: Handle UART without interrupt on TEMT using em485"). However,
that feature change had a flaw and it was reverted to make room for
simpler approach providing the same feature. The embedded fix got
reverted with the feature change.
Re-add the lsr_save_flags fix and properly mark it's a fix.
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
of_node_put() will check NULL pointer.
Fixes: a6848bdc470f ("USB: gadget driver for LPC32xx") Cc: stable <stable@kernel.org> Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220603140246.64529-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb_create_hcd will alloc memory for hcd, and we should
call usb_put_hcd to free it when platform_get_resource()
fails to prevent memory leak.
goto error2 label instead error1 to fix this.
Fixes: 1a68fdb1dd16 ("usb: dwc2: check return value after calling platform_get_resource()") Cc: stable <stable@kernel.org> Acked-by: Minas Harutyunyan <hminas@synopsys.com> Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220530085413.44068-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adding support for Cinterion device MV31 with Qualcomm
new baseline. Use different PIDs to separate it from
previous base line products.
All interfaces settings keep same as previous.
The expression for setting the size of the allocated bulk TX buffer
(`devpriv->usb_tx_buf`) is calling `usb_endpoint_maxp(devpriv->ep_rx)`,
which is using the wrong endpoint (should be `devpriv->ep_tx`). Fix it.
Fixes: f6d15c807df6 ("comedi: vmk80xx: fix transfer-buffer overflow") Cc: Johan Hovold <johan@kernel.org> Cc: stable@vger.kernel.org # 4.9+ Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20220607171819.4121-1-abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Even though the DW I2C controller reference clock source is requested by
the method devm_clk_get() with non-optional clock requirement the way the
clock handler is used afterwards has a pure optional clock semantic
(though in some circumstances we can get a warning about the clock missing
printed in the system console). There is no point in reimplementing that
functionality seeing the kernel clock framework already supports the
optional interface from scratch. Thus let's convert the platform driver to
using it.
Note by providing this commit we get to fix two problems. The first one
was introduced in commit dd31dff0282d ("i2c: designware: Add support for
an interface clock"). It causes not having the interface clock (pclk)
enabled/disabled in case if the reference clock isn't provided. The second
problem was first introduced in commit 74b8b8ef60dd ("i2c: designware: Do
not require clock when SSCN and FFCN are provided"). Since that
modification the deferred probe procedure has been unsupported in case if
the interface clock isn't ready.
Fixes: dd31dff0282d ("i2c: designware: Add support for an interface clock") Fixes: 74b8b8ef60dd ("i2c: designware: Do not require clock when SSCN and FFCN are provided") Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
of_find_node_by_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 65563d583bfc ("irqchip/gic-v3: Add support for partitioned PPIs") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-6-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
When kcalloc fails, it missing of_node_put() and results in refcount
leak. Fix this by goto out_put_node label.
Fixes: a5cd8e007711 ("irqchip/gic-v3: Dynamically allocate PPI partition descriptors") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-5-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
of_find_matching_node_and_match() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: f17c75ab0609 ("irqchip/gic/realview: Support more RealView DCC variants") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220601080930.31005-2-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
If a function lives in a section other than .text, but .text also exists
in the object, faddr2line may wrongly assume .text. This can result in
comically wrong output. For example:
$ scripts/faddr2line vmlinux.o enter_from_user_mode+0x1c
enter_from_user_mode+0x1c/0x30:
find_next_bit at /home/jpoimboe/git/linux/./include/linux/find.h:40
(inlined by) perf_clear_dirty_counters at /home/jpoimboe/git/linux/arch/x86/events/core.c:2504
Fix it by passing the section name to addr2line, unless the object file
is vmlinux, in which case the symbol table uses absolute addresses.
The branch range checks in ftrace_make_call() and ftrace_make_nop() are
incorrect, erroneously permitting a forwards branch of 128M and
erroneously rejecting a backwards branch of 128M.
This is because both functions calculate the offset backwards,
calculating the offset *from* the target *to* the branch, rather than
the other way around as the later comparisons expect.
If an out-of-range branch were erroeously permitted, this would later be
rejected by aarch64_insn_gen_branch_imm() as branch_imm_common() checks
the bounds correctly, resulting in warnings and the placement of a BRK
instruction. Note that this can only happen for a forwards branch of
exactly 128M, and so the caller would need to be exactly 128M bytes
below the relevant ftrace trampoline.
If an in-range branch were erroeously rejected, then:
* For modules when CONFIG_ARM64_MODULE_PLTS=y, this would result in the
use of a PLT entry, which is benign.
Note that this is the common case, as this is selected by
CONFIG_RANDOMIZE_BASE (and therefore RANDOMIZE_MODULE_REGION_FULL),
which distributions typically seelct. This is also selected by
CONFIG_ARM64_ERRATUM_843419.
* For modules when CONFIG_ARM64_MODULE_PLTS=n, this would result in
internal ftrace failures.
* For core kernel text, this would result in internal ftrace failues.
Note that for this to happen, the kernel text would need to be at
least 128M bytes in size, and typical configurations are smaller tha
this.
Fix this by calculating the offset *from* the branch *to* the target in
both functions.
Fixes: aa77dbc4da98 ("arm64: ftrace: don't validate branch via PLT in ftrace_make_nop()") Fixes: 3b5ecc7bd9fd ("arm64: ftrace: add support for far branches to dynamic ftrace") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Tested-by: "Ivan T. Ivanov" <iivanov@suse.de> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614080944.1349146-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Both RIF and ACL flow counters use a 24-bit SW-managed counter address to
communicate which counter they want to bind.
In a number of Spectrum FW releases, binding a RIF counter is broken and
slices the counter index to 16 bits. As a result, on Spectrum-2 and above,
no more than about 410 RIF counters can be effectively used. This
translates to 205 netdevices for which L3 HW stats can be enabled. (This
does not happen on Spectrum-1, because there are fewer counters available
overall and the counter index never exceeds 16 bits.)
Binding counters to ACLs does not have this issue. Therefore reorder the
counter allocation scheme so that RIF counters come first and therefore get
lower indices that are below the 16-bit barrier.
After PF reset and ethtool -t there was call trace in dmesg
sometimes leading to panic. When there was some time, around 5
seconds, between reset and test there were no errors.
Problem was that pf reset calls i40e_vsi_close in prep_for_reset
and ethtool -t calls i40e_vsi_close in diag_test. If there was not
enough time between those commands the second i40e_vsi_close starts
before previous i40e_vsi_close was done which leads to crash.
Add check to diag_test if pf is in reset and don't start offline
tests if it is true.
Add netif_info("testing failed") into unhappy path of i40e_diag_test()
Fixes: f6d7cf4d42bd ("i40e: Disable offline diagnostics if VFs are enabled") Fixes: 7cd7df842ad1 ("i40e: Fix ethtool offline diagnostic with netqueues") Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Procedure of configure tc flower filters erroneously allows to create
filters on TC0 where unfiltered packets are also directed by default.
Issue was caused by insufficient checks of hw_tc parameter specifying
the hardware traffic class to pass matching packets to.
Fix checking hw_tc parameter which blocks creation of filters on TC0.
Fixes: e17fea180694 ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
arch/x86/kernel/cpu/mshyperv.c is never compiled as modular.
(CONFIG_HYPERVISOR_GUEST is boolean)
Fixes: 05cb67b930fe ("clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220606050238.4162200-1-masahiroy@kernel.org Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit changes the default Kconfig values of RANDOM_TRUST_CPU and
RANDOM_TRUST_BOOTLOADER to be Y by default. It does not change any
existing configs or change any kernel behavior. The reason for this is
several fold.
As background, I recently had an email thread with the kernel
maintainers of Fedora/RHEL, Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine,
SUSE, and Void as recipients. I noted that some distros trust RDRAND,
some trust EFI, and some trust both, and I asked why or why not. There
wasn't really much of a "debate" but rather an interesting discussion of
what the historical reasons have been for this, and it came up that some
distros just missed the introduction of the bootloader Kconfig knob,
while another didn't want to enable it until there was a boot time
switch to turn it off for more concerned users (which has since been
added). The result of the rather uneventful discussion is that every
major Linux distro enables these two options by default.
While I didn't have really too strong of an opinion going into this
thread -- and I mostly wanted to learn what the distros' thinking was
one way or another -- ultimately I think their choice was a decent
enough one for a default option (which can be disabled at boot time).
I'll try to summarize the pros and cons:
Pros:
- The RNG machinery gets initialized super quickly, and there's no
messing around with subsequent blocking behavior.
- The bootloader mechanism is used by kexec in order for the prior
kernel to initialize the RNG of the next kernel, which increases
the entropy available to early boot daemons of the next kernel.
- Previous objections related to backdoors centered around
Dual_EC_DRBG-like kleptographic systems, in which observing some
amount of the output stream enables an adversary holding the right key
to determine the entire output stream.
This used to be a partially justified concern, because RDRAND output
was mixed into the output stream in varying ways, some of which may
have lacked pre-image resistance (e.g. XOR or an LFSR).
But this is no longer the case. Now, all usage of RDRAND and
bootloader seeds go through a cryptographic hash function. This means
that the CPU would have to compute a hash pre-image, which is not
considered to be feasible (otherwise the hash function would be
terribly broken).
- More generally, if the CPU is backdoored, the RNG is probably not the
realistic vector of choice for an attacker.
- These CPU or bootloader seeds are far from being the only source of
entropy. Rather, there is generally a pretty huge amount of entropy,
not all of which is credited, especially on CPUs that support
instructions like RDRAND. In other words, assuming RDRAND outputs all
zeros, an attacker would *still* have to accurately model every single
other entropy source also in use.
- The RNG now reseeds itself quite rapidly during boot, starting at 2
seconds, then 4, then 8, then 16, and so forth, so that other sources
of entropy get used without much delay.
- Paranoid users can set random.trust_{cpu,bootloader}=no in the kernel
command line, and paranoid system builders can set the Kconfig options
to N, so there's no reduction or restriction of optionality.
- It's a practical default.
- All the distros have it set this way. Microsoft and Apple trust it
too. Bandwagon.
Cons:
- RDRAND *could* still be backdoored with something like a fixed key or
limited space serial number seed or another indexable scheme like
that. (However, it's hard to imagine threat models where the CPU is
backdoored like this, yet people are still okay making *any*
computations with it or connecting it to networks, etc.)
- RDRAND *could* be defective, rather than backdoored, and produce
garbage that is in one way or another insufficient for crypto.
- Suggesting a *reduction* in paranoia, as this commit effectively does,
may cause some to question my personal integrity as a "security
person".
- Bootloader seeds and RDRAND are generally very difficult if not all
together impossible to audit.
Keep in mind that this doesn't actually change any behavior. This
is just a change in the default Kconfig value. The distros already are
shipping kernels that set things this way.
Ard made an additional argument in [1]:
We're at the mercy of firmware and micro-architecture anyway, given
that we are also relying on it to ensure that every instruction in
the kernel's executable image has been faithfully copied to memory,
and that the CPU implements those instructions as documented. So I
don't think firmware or ISA bugs related to RNGs deserve special
treatment - if they are broken, we should quirk around them like we
usually do. So enabling these by default is a step in the right
direction IMHO.
In [2], Phil pointed out that having this disabled masked a bug that CI
otherwise would have caught:
A clean 5.15.45 boots cleanly, whereas a downstream kernel shows the
static key warning (but it does go on to boot). The significant
difference is that our defconfigs set CONFIG_RANDOM_TRUST_BOOTLOADER=y
defining that on top of multi_v7_defconfig demonstrates the issue on
a clean 5.15.45. Conversely, not setting that option in a
downstream kernel build avoids the warning
When rx_flag == MTK_RX_FLAGS_HWLRO,
rx_data_len = MTK_MAX_LRO_RX_LENGTH(4096 * 3) > PAGE_SIZE.
netdev_alloc_frag is for alloction of page fragment only.
Reference to other drivers and Documentation/vm/page_frags.rst
Branch to use __get_free_pages when ring->frag_size > PAGE_SIZE.
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be
overflow. To fix, we can follow what udpv6 does and subtract the
transhdrlen from the max.
Similar to the handling of play_deferred in commit 888b5c605018
("Bluetooth: btusb: Fix memory leak in play_deferred"), we thought
a patch might be needed here as well.
Currently usb_submit_urb is called directly to submit deferred tx
urbs after unanchor them.
So the usb_giveback_urb_bh would failed to unref it in usb_unanchor_urb
and cause memory leak.
Put those urbs in tx_anchor to avoid the leak, and also fix the error
handling.
Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220607083230.6182-1-xiaohuizhang@ruc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The reference must be released when device_register(&vm_cmdline_parent)
failed. Add the corresponding 'put_device()' in the error handling path.
Signed-off-by: chengkaitao <pilgrimtao@gmail.com>
Message-Id: <20220602005542.16489-1-chengkaitao@didiglobal.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix missing resource cleanup (when '(--i) == 0') for error case in
ipr_alloc_mem() and skip incorrect resource cleanup (when '(--i) == 0') for
error case in ipr_request_other_msi_irqs() because variable i started from
1.
Link: https://lore.kernel.org/r/20220529153456.4183738-4-cgxu519@mykernel.net Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
NVMe Asynchronous Event Request commands have no command timeout value per
specifications.
Set WQE option to allow a reduced FLUSH polling rate for I/O error
detection specifically for nvme_admin_async_event commands.
Link: https://lore.kernel.org/r/20220603174329.63777-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After issuing a LIP, a specific target vendor does not ACC the FLOGI that
lpfc sends. However, it does send its own FLOGI that lpfc ACCs. The
target then establishes the port IDs by sending a PLOGI. lpfc PLOGI_ACCs
and starts the RPI registration for DID 0x000001. The target then sends a
LOGO to the fabric DID. lpfc is currently treating the LOGO from the
fabric DID as a link down and cleans up all the ndlps. The ndlp for DID
0x000001 is put back into NPR and discovery stops, leaving the port in
stuck in bypassed mode.
Change lpfc behavior such that if a LOGO is received for the fabric DID in
PT2PT topology skip the lpfc_linkdown_port() routine and just move the
fabric DID back to NPR.
Link: https://lore.kernel.org/r/20220603174329.63777-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
vcpuHint has been expanded to 16 bit on host to enable routing to more
CPUs. Guest side should align with the change. This change has been tested
with hosts with 8-bit and 16-bit vcpuHint, on both platforms host side can
get correct value.
Currently wm_adsp_fw_put() returns 0 rather than 1 when updating the value
of the control, meaning that no event is generated to userspace. Fix this
by setting the default return value to 1, the code already exits early with
a return value of 0 if the value is unchanged.
Currently the put() method for the deemphasis control returns 0 when a new
value is written to the control even if the value changed, meaning events
are not generated. Fix this, skip the work of updating the value when it is
unchanged and then return 1 after having done so.
In an unlikely (and probably wrong?) case that the 'ppi' parameter of
ata_host_alloc_pinfo() points to an array starting with a NULL pointer,
there's going to be a kernel oops as the 'pi' local variable won't get
reassigned from the initial value of NULL. Initialize 'pi' instead to
'&ata_dummy_port_info' to fix the possible kernel oops for good...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A couple of the SX volume controls specify 0x84 as the lowest volume
value, however the correct value from the datasheet is 0x44. The
datasheet don't include spaces in the value it displays as binary so
this was almost certainly just a typo reading 1000100.
Limit the error msg to avoid flooding the console. If you have a lot of
threads hitting this at once, they could have already gotten passed the
dma_debug_disabled() check before they get to the point of allocation
failure, resulting in quite a lot of this error message spamming the
log. Use pr_err_once() to limit that.
Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
We tried to enable the audio on an imx6sx EVB with the codec nau8822,
after setting the internal PLL fractional parameters, the audio still
couldn't work and the there was no sdma irq at all.
After checking with the section "8.1.1 Phase Locked Loop (PLL) Design
Example" of "NAU88C22 Datasheet Rev 0.6", we found we need to
turn off the PLL before programming fractional parameters and turn on
the PLL after programming.
After this change, the audio driver could record and play sound and
the sdma's irq is triggered when playing or recording.
Cc: David Lin <ctlin0@nuvoton.com> Cc: John Hsu <kchsu0@nuvoton.com> Cc: Seven Li <wtli@nuvoton.com> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20220530040151.95221-2-hui.wang@canonical.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
After looking into this issue, I find the buggy address belongs
to the task stack region. It seems KASAN has something wrong.
I look into the code of __get_wchan in x86 architecture and
find the same issue has been resolved by the commit 81539dfdd4b7 ("x86/mm, kasan: Silence KASAN warnings in get_wchan()").
The solution could be applied to powerpc architecture too.
As Andrey Ryabinin said, get_wchan() is racy by design, it may
access volatile stack of running task, thus it may access
redzone in a stack frame and cause KASAN to warn about this.
Use READ_ONCE_NOCHECK() to silence these warnings.
Reported-by: Wanming Hu <huwanming@huaweil.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121014418.155675-1-heying24@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Rather than accounting in bytes and multiplying (shifting), we can just
account in bits and avoid the shift. The main motivation for this is
there are other patches in flux that expand this code a bit, and
avoiding the duplication of "* 8" everywhere makes things a bit clearer.
Cc: stable@vger.kernel.org Fixes: 2aea1841598f ("random: credit architectural init the exact amount") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>