If the ext4 inode does not have xattr space, 0 is returned in the
get_max_inline_xattr_value_size function. Otherwise, the function returns
a negative value when the inode does not contain EXT4_STATE_XATTR.
Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
Hulk Robot reported a issue:
==================================================================
BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500
Write of size 4105 at addr ffff8881675ef5f4 by task syz-executor.0/7092
Above issue may happen as follows:
-------------------------------------
ext4_xattr_set
ext4_xattr_set_handle
ext4_xattr_ibody_find
>> s->end < s->base
>> no EXT4_STATE_XATTR
>> xattr_check_inode is not executed
ext4_xattr_ibody_set
ext4_xattr_set_entry
>> size_t min_offs = s->end - s->base
>> UAF in memcpy
we can easily reproduce this problem with the following commands:
mkfs.ext4 -F /dev/sda
mount -o debug_want_extra_isize=128 /dev/sda /mnt
touch /mnt/file
setfattr -n user.cat -v `seq -s z 4096|tr -d '[:digit:]'` /mnt/file
In ext4_xattr_ibody_find, we have the following assignment logic:
header = IHDR(inode, raw_inode)
= raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize
is->s.base = IFIRST(header)
= header + sizeof(struct ext4_xattr_ibody_header)
is->s.end = raw_inode + s_inode_size
In the calculation formula, all values except s_inode_size and
i_extra_size are fixed values. When i_extra_size is the maximum value
s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8.
The value overflows. As a result, the preceding issue is triggered when
memcpy is executed.
Therefore, when finding xattr or setting xattr, check whether
there is space for storing xattr in the inode to resolve this issue.
Cc: stable@kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
When adding an xattr to an inode, we must ensure that the inode_size is
not less than EXT4_GOOD_OLD_INODE_SIZE + extra_isize + pad. Otherwise,
the end position may be greater than the start position, resulting in UAF.
Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220616021358.2504451-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback. Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages(). That function
removes any corresponding delayed allocated blocks from the extent
status tree. If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.
Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem. Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.
Cc: stable@kernel.org Reported-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
When doing an online resize, the on-disk superblock on-disk wasn't
updated. This means that when the file system is unmounted and
remounted, and the on-disk overhead value is non-zero, this would
result in the results of statfs(2) to be incorrect.
This was partially fixed by Commits 5bf43d031edc ("ext4: fix overhead
calculation to account for the reserved gdt blocks"), 0371b6196528
("ext4: force overhead calculation if the s_overhead_cluster makes no
sense"), and fdaf209c1c0f ("ext4: update the cached overhead value in
the superblock").
However, since it was too expensive to forcibly recalculate the
overhead for bigalloc file systems at every mount, this didn't fix the
problem for bigalloc file systems. This commit should address the
problem when resizing file systems with the bigalloc feature enabled.
Since -Warray-bounds checks the destination size from the type of given
pointer, __assign_rel_str() macro gets warned because it passes the
pointer to the 'u32' field instead of 'trace_event_raw_*' data structure.
Pass the data address calculated from the 'trace_event_raw_*' instead of
'u32' __rel_loc field.
Link: https://lkml.kernel.org/r/20220125233154.dac280ed36944c0c2fe6f3ac@kernel.org Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ This did not fix the warning, but is still a nice clean up ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add '__rel_loc' using trace event macros. These macros are usually
not used in the kernel, except for testing purpose.
This also add "rel_" variant of macros for dynamic_array string,
and bitmask.
Link: https://lkml.kernel.org/r/163757342119.510314.816029622439099016.stgit@devnote2 Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a KASAN warning in raid_resume when running the lvm test
lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks
is greater than rs->raid_disks, so the loop touches one entry beyond
the allocated length.
There is this warning when using a kernel with the address sanitizer
and running this testsuite:
https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid
==================================================================
BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid]
Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319
CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9c
print_address_description.constprop.0+0x1f/0x1e0
print_report.cold+0x55/0x244
kasan_report+0xc9/0x100
raid_status+0x1747/0x2820 [dm_raid]
dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod]
table_load+0x35c/0x630 [dm_mod]
ctl_ioctl+0x411/0x630 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
__x64_sys_ioctl+0x12a/0x1a0
do_syscall_64+0x5b/0x80
The warning is caused by reading conf->max_nr_stripes in raid_status. The
code in raid_status reads mddev->private, casts it to struct r5conf and
reads the entry max_nr_stripes.
However, if we have different raid type than 4/5/6, mddev->private
doesn't point to struct r5conf; it may point to struct r0conf, struct
r1conf, struct r10conf or struct mpconf. If we cast a pointer to one
of these structs to struct r5conf, we will be reading invalid memory
and KASAN warns about it.
Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6.
Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and
only if the MSR exists (according to the guest vCPU model). KVM has very
misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and
attempts to force the nVMX MSR settings to match the vPMU model, i.e. to
hide/expose the control based on whether or not the MSR exists from the
guest's perspective.
KVM's modifications fail to handle the scenario where the vPMU is hidden
from the guest _after_ being exposed to the guest, e.g. by userspace
doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any
KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a
recognized vPMU, i.e. KVM will leave the bits in the allow state and then
ultimately reject the MSR load and WARN.
KVM should not force the VMX MSRs in the first place. KVM taking control
of the MSRs was a misguided attempt at mimicking what commit 0ac421831741
("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled",
2018-10-01) did for MPX. However, the MPX commit was a workaround for
another KVM bug and not something that should be imitated (and it should
never been done in the first place).
In other words, KVM's ABI _should_ be that userspace has full control
over the MSRs, at which point triggering the WARN that loading the MSR
must not fail is trivial.
The intent of the WARN is still valid; KVM has consistency checks to
ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The
problem is that '0' must be considered a valid value at all times, and so
the simple/obvious solution is to just not actually load the MSR when it
does not exist. It is userspace's responsibility to provide a sane vCPU
model, i.e. KVM is well within its ABI and Intel's VMX architecture to
skip the loads if the MSR does not exist.
Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is
unintuitive _and_ diverges from Intel's architecturally defined behavior.
Even worse, KVM currently implements the check using two different (but
equivalent) checks, _and_ there has been at least one attempt to add a
_third_ flavor.
Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722224409.1336532-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
as reserved if there is no guest vPMU. The nVMX VM-Entry consistency
checks do not check for a valid vPMU prior to consuming the masks via
kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
actual MSR load will be rejected by intel_is_valid_msr()).
Fixes: 0b1741ba4689 ("KVM: Expose a version 2 architectural PMU to a guests") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722224409.1336532-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.
Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The existing logic in KVM to support guests calling H_RANDOM only works
on Power8, because it looks for an RNG in the device tree, but on Power9
we just use darn.
In addition the existing code needs to work in real mode, so we have the
special cased powernv_get_random_real_mode() to deal with that.
Instead just have KVM call ppc_md.get_random_seed(), and do the real
mode check inside of there, that way we use whatever RNG is available,
including darn on Power9.
Fixes: 1746d4c9c4c7 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com>
[mpe: Rebase on previous commit, update change log appropriately] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220727143219.2684192-2-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a problem with the current revision checks in
is_cppc_supported() that they essentially prevent the CPPC support
from working if a new _CPC package format revision being a proper
superset of the v3 and only causing _CPC to return a package with more
entries (while retaining the types and meaning of the entries defined by
the v3) is introduced in the future and used by the platform firmware.
In that case, as long as the number of entries in the _CPC return
package is at least CPPC_V3_NUM_ENT, it should be perfectly fine to
use the v3 support code and disregard the additional package entries
added by the new package format revision.
For this reason, drop is_cppc_supported() altogether, put the revision
checks directly into acpi_cppc_processor_probe() so they are easier to
follow and rework them to take the case mentioned above into account.
Fixes: 0ba5a50084d3 ("ACPI / CPPC: Add support for CPPC v3") Cc: 4.18+ <stable@vger.kernel.org> # 4.18+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 066cad67b8db seemingly inadvertently moved the code responsible
for flagging the filesystem as having BIG_METADATA to a place where
setting the flag was essentially lost. This means that
filesystems created with kernels containing this bug (starting with 5.15)
can potentially be mounted by older (pre-3.4) kernels. In reality
chances for this happening are low because there are other incompat
flags introduced in the mean time. Still the correct behavior is to set
INCOMPAT_BIG_METADATA flag and persist this in the superblock.
Fixes: 066cad67b8db ("btrfs: fix upper limit for max_inline for page size 64K") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If you try to force a chunk allocation, but you race with another chunk
allocation, you will end up waiting on the chunk allocation that just
occurred and then allocate another chunk. If you have many threads all
doing this at once you can way over-allocate chunks.
Fix this by resetting force to NO_FORCE, that way if we think we need to
allocate we can, otherwise we don't force another chunk allocation if
one is already happening.
Reviewed-by: Filipe Manana <fdmanana@suse.com> CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While we debug the issue, we found running fstests generic/551 on 5GB
non-zoned null_blk device in the emulated zoned mode also had a
similar hung issue.
Also, we can reproduce the same symptom with an error injected
cow_file_range() setup.
The hang occurs when cow_file_range() fails in the middle of
allocation. cow_file_range() called from do_allocation_zoned() can
split the give region ([start, end]) for allocation depending on
current block group usages. When btrfs can allocate bytes for one part
of the split regions but fails for the other region (e.g. because of
-ENOSPC), we return the error leaving the pages in the succeeded regions
locked. Technically, this occurs only when @unlock == 0. Otherwise, we
unlock the pages in an allocated region after creating an ordered
extent.
Considering the callers of cow_file_range(unlock=0) won't write out
the pages, we can unlock the pages on error exit from
cow_file_range(). So, we can ensure all the pages except @locked_page
are unlocked on error case.
In summary, cow_file_range now behaves like this:
- page_started == 1 (return value)
- All the pages are unlocked. IO is started.
- unlock == 1
- All the pages except @locked_page are unlocked in any case
- unlock == 0
- On success, all the pages are locked for writing out them
- On failure, all the pages except @locked_page are unlocked
Fixes: 0415e2d9446e ("btrfs: zoned: introduce dedicated data write path for zoned filesystems") CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In our test of iocost, we encountered some list add/del corruptions of
inner_walk list in ioc_timer_fn.
The reason can be described as follows:
cpu 0 cpu 1
ioc_qos_write ioc_qos_write
ioc = q_to_ioc(queue);
if (!ioc) {
ioc = kzalloc();
ioc = q_to_ioc(queue);
if (!ioc) {
ioc = kzalloc();
...
rq_qos_add(q, rqos);
}
...
rq_qos_add(q, rqos);
...
}
When the io.cost.qos file is written by two cpus concurrently, rq_qos may
be added to one disk twice. In that case, there will be two iocs enabled
and running on one disk. They own different iocgs on their active list. In
the ioc_timer_fn function, because of the iocgs from two iocs have the
same root iocg, the root iocg's walk_list may be overwritten by each other
and this leads to list add/del corruptions in building or destroying the
inner_walk list.
And so far, the blk-rq-qos framework works in case that one instance for
one type rq_qos per queue by default. This patch make this explicit and
also fix the crash above.
The csdlock_debug kernel-boot parameter is parsed by the
early_param() function csdlock_debug(). If set, csdlock_debug()
invokes static_branch_enable() to enable csd_lock_wait feature, which
triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and
CONFIG_SPARSEMEM_VMEMMAP=n.
With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in
static_key_enable() and returns NULL, resulting in a NULL dereference
because mem_section is initialized only later in sparse_init().
This is also a problem for powerpc because early_param() functions
are invoked earlier than jump_label_init(), also resulting in
static_key_enable() failures. These failures cause the warning "static
key 'xxx' used before call to jump_label_init()".
Thus, early_param is too early for csd_lock_wait to run
static_branch_enable(), so changes it to __setup to fix these.
Fixes: e600706be131 ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging") Cc: stable@vger.kernel.org Reported-by: Chen jingwen <chenjingwen6@huawei.com> Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The rng's random_init() function contributes the real time to the rng at
boot time, so that events can at least start in relation to something
particular in the real world. But this clock might not yet be set that
point in boot, so nothing is contributed. In addition, the relation
between minor clock changes from, say, NTP, and the cycle counter is
potentially useful entropic data.
This commit addresses this by mixing in a time stamp on calls to
settimeofday and adjtimex. No entropy is credited in doing so, so it
doesn't make initialization faster, but it is still useful input to
have.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ensure that the fid's iounit field is set to zero when a new fid is
created. Certain 9P operations, such as OPEN and CREATE, allow the
server to reply with an iounit size which the client code assigns to the
p9_fid struct shortly after the fid is created by p9_fid_create(). On
the other hand, an XATTRWALK operation doesn't allow for the server to
specify an iounit value. The iounit field of the newly allocated p9_fid
struct remained uninitialized in that case. Depending on allocation
patterns, the iounit value could have been something reasonable that was
carried over from previously freed fids or, in the worst case, could
have been arbitrary values from non-fid related usages of the memory
location.
The bug was detected in the Windows Subsystem for Linux 2 (WSL2) kernel
after the uninitialized iounit field resulted in the typical sequence of
two getxattr(2) syscalls, one to get the size of an xattr and another
after allocating a sufficiently sized buffer to fit the xattr value, to
hit an unexpected ERANGE error in the second call to getxattr(2). An
uninitialized iounit field would sometimes force rsize to be smaller
than the xattr value size in p9_client_read_once() and the 9P server in
WSL refused to chunk up the READ on the attr_fid and, instead, returned
ERANGE to the client. The virtfs server in QEMU seems happy to chunk up
the READ and this problem goes undetected there.
Fault inject on pool metadata device reports:
BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80
Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950
CPU: 7 PID: 950 Comm: dmsetup Tainted: G W 5.19.0-rc6 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_address_description.constprop.0.cold+0xeb/0x3f4
kasan_report.cold+0xe6/0x147
dm_pool_register_metadata_threshold+0x40/0x80
pool_ctr+0xa0a/0x1150
dm_table_add_target+0x2c8/0x640
table_load+0x1fd/0x430
ctl_ioctl+0x2c4/0x5a0
dm_ctl_ioctl+0xa/0x10
__x64_sys_ioctl+0xb3/0xd0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This can be easily reproduced using:
echo offline > /sys/block/sda/device/state
dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10
dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0"
If a metadata commit fails, the transaction will be aborted and the
metadata space maps will be destroyed. If a DM table reload then
happens for this failed thin-pool, a use-after-free will occur in
dm_sm_register_threshold_callback (called from
dm_pool_register_metadata_threshold).
Fix this by in dm_pool_register_metadata_threshold() by returning the
-EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr()
with a new error message: "Error registering metadata threshold".
Fixes: 1fb0bf620f327 ("dm thin: generate event when metadata threshold passed") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2eafec7364f3 ("s390/kexec_file: Signature verification prototype")
adds support for KEXEC_SIG verification with keys from platform keyring
but the built-in keys and secondary keyring are not used.
Add support for the built-in keys and secondary keyring as x86 does.
dm-writecache has the capability to limit the number of writeback jobs
in progress. However, this feature was off by default. As such there
were some out-of-memory crashes observed when lowering the low
watermark while the cache is full.
This commit enables writeback limit by default. It is set to 256MiB or
1/16 of total system memory, whichever is smaller.
Add support for some of the Brainboxes PCIe (PX) range of
serial cards, including the PX-101, PX-235/PX-246,
PX-203/PX-257, PX-260/PX-701, PX-310, PX-313,
PX-320/PX-324/PX-376/PX-387, PX-335/PX-346, PX-368, PX-420,
PX-803 and PX-846.
Oxford Semiconductor PCIe (Tornado) 950 serial port devices are driven
by a fixed 62.5MHz clock input derived from the 100MHz PCI Express clock.
We currently drive the device using its default oversampling rate of 16
and the clock prescaler disabled, consequently yielding the baud base of 3906250. This base is inadequate for some of the high-speed baud rates
such as 460800bps, for which the closest rate possible can be obtained
by dividing the baud base by 8, yielding the baud rate of 488281.25bps,
which is off by 5.9638%. This is enough for data communication to break
with the remote end talking actual 460800bps, where missed stop bits
have been observed.
We can do better however, by taking advantage of a reduced oversampling
rate, which can be set to any integer value from 4 to 16 inclusive by
programming the TCR register, and by using the clock prescaler, which
can be set to any value from 1 to 63.875 in increments of 0.125 in the
CPR/CPR2 register pair. The prescaler has to be explicitly enabled
though by setting bit 7 in the MCR or otherwise it is bypassed (in the
enhanced mode that we enable) as if the value of 1 was used.
Make use of these features then as follows:
- Set the baud base to 15625000, reflecting the minimum oversampling
rate of 4 with the clock prescaler and divisor both set to 1.
- Override the `set_mctrl' and set the MCR shadow there so as to have
MCR[7] always set and have the 8250 core propagate these settings.
- Override the `get_divisor' handler and determine a good combination of
parameters by using a lookup table with predetermined value pairs of
the oversampling rate and the clock prescaler and finding a pair that
divides the input clock such that the quotient, when rounded to the
nearest integer, deviates the least from the exact result. Calculate
the clock divisor accordingly.
Scale the resulting oversampling rate (only by powers of two) if
possible so as to maximise it, reducing the divisor accordingly, and
avoid a divisor overflow for very low baud rates by scaling the
oversampling rate and/or the prescaler even if that causes some
accuracy loss.
Also handle the historic spd_cust feature so as to allow one to set
all the three parameters manually to arbitrary values, by keeping the
low 16 bits for the divisor and then putting TCR in bits 19:16 and
CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as
to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 33.875.
This preserves compatibility with any existing setups, that is where
requesting a custom divisor that only has any bits set among the low
16 the oversampling rate of 16 and the clock prescaler of 33.875 will
be used as with the original 8250.
Finally abuse the `frac' argument to store the determined bit patterns
for the TCR, CPR and CPR2 registers.
- Override the `set_divisor' handler so as to set the TCR, CPR and CPR2
registers from the `frac' value supplied. Set the divisor as usual.
With the baud base set to 15625000 and the unsigned 16-bit UART_DIV_MAX
limitation imposed by `serial8250_get_baud_rate' standard baud rates
below 300bps become unavailable in the regular way, e.g. the rate of
200bps requires the baud base to be divided by 78125 and that is beyond
the unsigned 16-bit range. The historic spd_cust feature can still be
used to obtain such rates if so required.
See Documentation/tty/device_drivers/oxsemi-tornado.rst for more details.
The EndRun PTP/1588 dual serial port device is based on the Oxford
Semiconductor OXPCIe952 UART device with the PCI vendor:device ID set
for EndRun Technologies and uses the same sequence to determine the
number of ports available. Despite that we have duplicate code
specific to the EndRun device.
Remove redundant code then and factor out OxSemi Tornado device
detection.
Currently the Gen2 port in IPQ8074 will cause the system to hang as it
accesses DBI registers in qcom_pcie_init_2_3_3(), and those are only
accesible after phy_power_on().
Move the DBI read/writes to a new qcom_pcie_post_init_2_3_3(), which is
executed after phy_power_on().
Link: https://lore.kernel.org/r/20220623155004.688090-1-robimarko@gmail.com Fixes: eb37071aabf3 ("PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code") Signed-off-by: Robert Marko <robimarko@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously we iterated over AER stat *names*, e.g.,
aer_correctable_error_string[32], but the actual stat *counters* may not be
that large, e.g., pdev->aer_stats->dev_cor_errs[16], which means that we
printed junk in the sysfs stats files.
Iterate over the stat counter arrays instead of the names to avoid this
junk.
Also, added a build time check to make sure all
counters have entries in strings array.
after converting the type of the first argument (@nr, bit number)
of arch_test_bit() from `long` to `unsigned long`[0].
Under certain conditions (for example, when ACPI NUMA is disabled
via command line), pxm_to_node() can return %NUMA_NO_NODE (-1).
It is valid 'magic' number of NUMA node, but not valid bit number
to use in bitops.
node_online() eventually descends to test_bit() without checking
for the input, assuming it's on caller side (which might be good
for perf-critical tasks). There, -1 becomes %ULONG_MAX which leads
to an insane array index when calculating bit position in memory.
For now, add an explicit check for @node being not %NUMA_NO_NODE
before calling test_bit(). The actual logics didn't change here
at all.
Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR. The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs. I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case. Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.
Fixes: 5624ec74455f ("KVM: Add MCE support") Fixes: f143e7d581a8 ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors. Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.
Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:
This patch refactors the SCSI paths to use SLI-4 as the primary interface.
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
Link: https://lore.kernel.org/r/20220225022308.16486-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Convert the SLI4 fast and slow paths to use native SLI4 wqe constructs
instead of iocb SLI3-isms.
Includes the following:
- Create simple get_xxx and set_xxx routines to wrapper access to common
elements in both SLI3 and SLI4 commands - allowing calling routines to
avoid sli-rev-specific structures to access the elements.
- using the wqe in the job structure as the primary element
- use defines from SLI-4, not SLI-3
- Removal of iocb to wqe conversion from fast and slow path
- Add below routines to handle fast path
lpfc_prep_embed_io - prepares the wqe for fast path
lpfc_wqe_bpl2sgl - manages bpl to sgl conversion
lpfc_sli_wqe2iocb - converts a WQE to IOCB for SLI-3 path
- Add lpfc_sli3_iocb2wcqecmpl in completion path to convert an SLI-3
iocb completion to wcqe completion
- Refactor some of the code that works on both revs for clarity
Link: https://lore.kernel.org/r/20220225022308.16486-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure.
This is a "common" structure but many of the components refer to sli-rev
specific entities which can lead the developer astray as to what they
actually mean, should be set to, or when they should be used.
This first patch prepares the lpfc_iocbq structure so that elements common
to both SLI3 and SLI4 data paths are more appropriately named, making it
clear they apply generically.
Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually
generic to the paths are renamed to 'cmd':
- iocb_flag is renamed to cmd_flag
- lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag
- fabric_iocb_cmpl is renamed to fabric_cmd_cmpl
- wait_iocb_cmpl is renamed to wait_cmd_cmpl
- iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl
- rsvd2 member is renamed to num_bdes due to pre-existing usage
The structure name itself will retain the iocb reference as changing to a
more relevant "job" or "cmd" title induces many hundreds of line changes
for only a name change.
lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use
in the SLI3 path only.
Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.
There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.
Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.
The following modifications are made:
- A new flag is set in PCI error entrypoints so the driver can track being
called by that path.
- An interlock is added in the SLI hw error path and the PCI error path
such that only one of the paths proceeds with the teardown logic.
- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
cmds in cases of hw error.
- If entering the SLI port re-init calls, a case where SLI error teardown
was quick and beat the PCI calls now reporting error, check whether the
SLI port is still live on the PCI bus.
- In the PCI reset code to bring the adapter back, recheck the IRQ
settings. Different checks for SLI3 vs SLI4.
- In I/O completions, that may be called as part of the cleanup or
underway just before the hw error, check the state of the adapter. If
in error, shortcut handling that would expect further adapter
completions as the hw error won't be sending them.
- In routines waiting on I/O completions, which may have been in progress
prior to the hw error, detect the device is being torn down and abort
from their waits and just give up. This points to a larger issue in the
driver on ref-counting for data structures, as it doesn't have
ref-counting on q and port structures. We'll do this fix for now as it
would be a major rework to be done differently.
- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
failed back due to hw error.
- In I/O buf allocation, done at the start of new I/Os, check hw state and
fail if hw error.
Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When scpi probe fails, at any point, we need to ensure that the scpi_info
is not set and will remain NULL until the probe succeeds. If it is not
taken care, then it could result use-after-free as the value is exported
via get_scpi_ops() and could refer to a memory allocated via devm_kzalloc()
but freed when the probe fails.
Commit 2bdbf5a23dc6 ("smsc95xx: add phylib support") amended
smsc95xx_resume() to call phy_init_hw(). That function waits for the
device to runtime resume even though it is placed in the runtime resume
path, causing a deadlock.
The problem is that phy_init_hw() calls down to smsc95xx_mdiobus_read(),
which never uses the _nopm variant of usbnet_read_cmd().
Commit 0fe16173902d ("usbnet: smsc95xx: add reset_resume function with
reset operation") causes a similar deadlock on resume if the device was
already runtime suspended when entering system sleep:
That's because the commit introduced smsc95xx_reset_resume(), which
calls down to smsc95xx_reset(), which neglects to use _nopm accessors.
Fix by auto-detecting whether a device access is performed by the
suspend/resume task_struct and use the _nopm variant if so. This works
because the PM core guarantees that suspend/resume callbacks are run in
task context.
Link status of SMSC LAN95xx chips is polled once per second, even though
they're capable of signaling PHY interrupts through the MAC layer.
Forward those interrupts to the PHY driver to avoid polling. Benefits
are reduced bus traffic, reduced CPU overhead and quicker interface
bringup.
Polling was introduced in 2016 by commit c0bb32545eb4 ("usbnet:
smsc95xx: fix link detection for disabled autonegotiation").
Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
hence couldn't detect link-up events when auto-negotiation was disabled.
The proper solution would have been to enable the ENERGYON interrupt
instead of polling.
Since then, PHY handling was moved from the LAN95xx driver to the SMSC
PHY driver with commit 2bdbf5a23dc6 ("smsc95xx: add phylib support").
That PHY driver is capable of link detection with auto-negotiation
disabled because it enables the ENERGYON interrupt.
Note that signaling interrupts through the MAC layer not only works with
the integrated PHY, but also with an external PHY, provided its
interrupt pin is attached to LAN95xx's nPHY_INT pin.
In the unlikely event that the interrupt pin of an external PHY is
attached to a GPIO of the SoC (or not connected at all), the driver can
be amended to retrieve the irq from the PHY's of_node.
To forward PHY interrupts to phylib, it is not sufficient to call
phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run
so that PHY interrupts are cleared. That's because according to page
119 of the LAN950x datasheet, "The source of this interrupt is a level.
The interrupt persists until it is cleared in the PHY."
Therefore, create an IRQ domain with a single IRQ for the PHY. In the
future, the IRQ domain may be extended to support the 11 GPIOs on the
LAN95xx.
Normally the PHY interrupt should be masked until the PHY driver has
cleared it. However masking requires a (sleeping) USB transaction and
interrupts are received in (non-sleepable) softirq context. I decided
not to mask the interrupt at all (by using the dummy_irq_chip's noop
->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec
intervals and normally that's sufficient to wake the PHY driver's IRQ
thread and have it clear the interrupt. If it does take longer, worst
thing that can happen is the IRQ thread is woken again. No big deal.
Because PHY interrupts are now perpetually enabled, there's no need to
selectively enable them on suspend. So remove all invocations of
smsc95xx_enable_phy_wakeup_interrupts().
In smsc95xx_resume(), move the call of phy_init_hw() before
usbnet_resume() (which restarts the status URB) to ensure that the PHY
is fully initialized when an interrupt is handled.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective Cc: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the
MAC full duplex mode and PHY flow control registers based on cached data
in struct phy_device:
smsc95xx_status() # raises EVENT_LINK_RESET
usbnet_deferred_kevent()
smsc95xx_link_reset() # uses cached data in phydev
Simultaneously, phylib polls link status once per second and updates
that cached data:
phy_state_machine()
phy_check_link_status()
phy_read_status()
lan87xx_read_status()
genphy_read_status() # updates cached data in phydev
If smsc95xx_link_reset() wins the race against genphy_read_status(),
the registers may be updated based on stale data.
E.g. if the link was previously down, phydev->duplex is set to
DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even
though genphy_read_status() may update it to DUPLEX_FULL afterwards.
PHY interrupts are currently only enabled on suspend to trigger wakeup,
so the impact of the race is limited, but we're about to enable them
perpetually.
Avoid the race by delaying execution of smsc95xx_link_reset() until
phy_state_machine() has done its job and calls back via
smsc95xx_handle_link_change().
Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib
picks up link status changes through polling. So drop the declaration
of a ->link_reset() callback.
Note that the semicolon on a line by itself added in smsc95xx_status()
is a placeholder for a function call which will be added in a subsequent
commit. That function call will actually handle the INT_ENP_PHY_INT_
interrupt.
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver
attempts to clear the signaled interrupts by writing "all ones" to the
Interrupt Status Register.
However the driver only ever enables a single type of interrupt, namely
the PHY Interrupt. And according to page 119 of the LAN950x datasheet,
its bit in the Interrupt Status Register is read-only. There's no other
way to clear it than in a separate PHY register:
vc4_drv isn't necessarily under the /soc node in DT as it is a
virtual device, but it is the one that does the allocations.
The DMA addresses are consumed by primarily the HVS or V3D, and
those require VideoCore cache alias address mapping, and so will be
under /soc.
During probe find the a suitable device node for HVS or V3D,
and adopt the DMA configuration of that node.
The WD22TB4 Thunderbolt dock at least will revert its DP_MAX_LINK_RATE
from HBR3 to HBR2 after system suspend/resume if the DP_DP13_DPCD_REV
registers are not read subsequently also as required.
Fix this by reading DP_DP13_DPCD_REV registers as well, matching what is
done during connector detection. While at it also fix up the same call
in drm_dp_mst_dump_topology().
Cc: Lyude Paul <lyude@redhat.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5292 Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Cc: <stable@vger.kernel.org> # v5.14+ Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220614094537.885472-1-imre.deak@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
BLAKE2s has no currently known use as an shash. Just remove all of this
unnecessary plumbing. Removing this shash was something we talked about
back when we were making BLAKE2s a built-in, but I simply never got
around to doing it. So this completes that project.
Importantly, this fixs a bug in which the lib code depends on
crypto_simd_disabled_for_test, causing linker errors.
Also add more alignment tests to the selftests and compare SIMD and
non-SIMD compression functions, to make up for what we lose from
testmgr.c.
Reported-by: gaochao <gaochao49@huawei.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Fixes: c63117175cce ("lib/crypto: blake2s: include as built-in") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
To comply with the panel sequence, hold the mipi signal to LP00 before
the dcs cmds transmission, and pull the mipi signal high from LP00 to
LP11 until the start of the dcs cmds transmission.
The normal panel timing is :
(1) pp1800 DC pull up
(2) avdd & avee AC pull high
(3) lcm_reset pull high -> pull low -> pull high
(4) Pull MIPI signal high (LP11) -> initial code -> send video data
(HS mode)
The power-off sequence is reversed.
If dsi is not in cmd mode, then dsi will pull the mipi signal high in
the mtk_output_dsi_enable function. The delay in lane_ready func is
the reaction time of dsi_rx after pulling up the mipi signal.
Fixes: 391bb3ed9f63 ("drm/mediatek: mtk_dsi: Use the drm_panel_bridge API") Link: https://patchwork.kernel.org/project/linux-mediatek/patch/1653012007-11854-4-git-send-email-xinlei.lee@mediatek.com/ Cc: <stable@vger.kernel.org> # 5.10.x: c6d694d8bfbd: drm/mediatek: Modify dsi funcs to atomic operations Cc: <stable@vger.kernel.org> # 5.10.x: 23c09cfcb9f1: drm/mediatek: Separate poweron/poweroff from enable/disable and define new funcs Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Jitao Shi <jitao.shi@mediatek.com> Signed-off-by: Xinlei Lee <xinlei.lee@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com> Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
trace_spmi_write_begin() and trace_spmi_read_end() both call
memcpy() with a length of "len + 1". This leads to one extra
byte being read beyond the end of the specified buffer. Fix
this out-of-bound memory access by using a length of "len"
instead.
Here is a KASAN log showing the issue:
BUG: KASAN: stack-out-of-bounds in trace_event_raw_event_spmi_read_end+0x1d0/0x234
Read of size 2 at addr ffffffc0265b7540 by task thermal@2.0-ser/1314
...
Call trace:
dump_backtrace+0x0/0x3e8
show_stack+0x2c/0x3c
dump_stack_lvl+0xdc/0x11c
print_address_description+0x74/0x384
kasan_report+0x188/0x268
kasan_check_range+0x270/0x2b0
memcpy+0x90/0xe8
trace_event_raw_event_spmi_read_end+0x1d0/0x234
spmi_read_cmd+0x294/0x3ac
spmi_ext_register_readl+0x84/0x9c
regmap_spmi_ext_read+0x144/0x1b0 [regmap_spmi]
_regmap_raw_read+0x40c/0x754
regmap_raw_read+0x3a0/0x514
regmap_bulk_read+0x418/0x494
adc5_gen3_poll_wait_hs+0xe8/0x1e0 [qcom_spmi_adc5_gen3]
...
__arm64_sys_read+0x4c/0x60
invoke_syscall+0x80/0x218
el0_svc_common+0xec/0x1c8
...
addr ffffffc0265b7540 is located in stack of task thermal@2.0-ser/1314 at offset 32 in frame:
adc5_gen3_poll_wait_hs+0x0/0x1e0 [qcom_spmi_adc5_gen3]
this frame has 1 object:
[32, 33) 'status'
Memory state around the buggy address: ffffffc0265b7400: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ffffffc0265b7480: 04 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc0265b7500: 00 00 00 00 f1 f1 f1 f1 01 f3 f3 f3 00 00 00 00
^ ffffffc0265b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0265b7600: f1 f1 f1 f1 01 f2 07 f2 f2 f2 01 f3 00 00 00 00
==================================================================
Fixes: 5c52d253058a ("spmi: add command tracepoints for SPMI") Cc: stable@vger.kernel.org Reviewed-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David Collins <quic_collinsd@quicinc.com> Link: https://lore.kernel.org/r/20220627235512.2272783-1-quic_collinsd@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Validate mount_lock seqcount as soon as we cross into mount in RCU
mode. Sure, ->mnt_root is pinned and will remain so until we
do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
the mount_lock had been touched, but we might run into a hard error
(e.g. -ENOENT) before trying to unlazy. And it's possible to end
up with RCU pathwalk racing with rename() and umount() in a way
that would fail with -ENOENT while non-RCU pathwalk would've
succeeded with any timings.
Once upon a time we hadn't needed that, but analysis had been subtle,
brittle and went out of window as soon as RENAME_EXCHANGE had been
added.
It's narrow, hard to hit and won't get you anything other than
stray -ENOENT that could be arranged in much easier way with the
same priveleges, but it's a bug all the same.
Commit 52a323cd661b ("posix-cpu-timers: Store a reference to a pid not a
task") started looking up tasks by PID when deleting a CPU timer.
When a non-leader thread calls execve, it will switch PIDs with the leader
process. Then, as it calls exit_itimers, posix_cpu_timer_del cannot find
the task because the timer still points out to the old PID.
That means that armed timers won't be disarmed, that is, they won't be
removed from the timerqueue_list. exit_itimers will still release their
memory, and when that list is later processed, it leads to a
use-after-free.
Clean up the timers from the de-threaded task before freeing them. This
prevents a reported use-after-free.
Fixes: 52a323cd661b ("posix-cpu-timers: Store a reference to a pid not a task") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220809170751.164716-1-cascardo@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Solution is to send lease break ack immediately even in case of
deferred close handles to avoid lease break request timing out
and let deferred closed handle gets closed as scheduled.
Later patches could optimize cases where we then close some
of these handles sooner for the cases where lease break is to 'none'
Cc: stable@kernel.org Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bitops compile-time optimization series revealed one more
problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC
warnings:
arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state':
arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state)
| ^~
arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning
Despite this code working as intended, this redundant double
negation of boolean value, together with comparing to `char`
with no explicit conversion to bool, makes compilers think
the author made some unintentional logical mistakes here.
Make it the other way around and negate the char instead
to silence the warnings.
Fixes: 4cf3fea223f2 ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches") Cc: stable@vger.kernel.org # 3.5+ Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: kernel test robot <lkp@intel.com> Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix kprobes to update kcb (kprobes control block) status flag to
KPROBE_HIT_SSDONE even if the kp->post_handler is not set.
This bug may cause a kernel panic if another INT3 user runs right
after kprobes because kprobe_int3_handler() misunderstands the
INT3 is kprobe's single stepping INT3.
When a ftrace_bug happens (where ftrace fails to modify a location) it is
helpful to have what was at that location as well as what was expected to
be there.
But with the conversion to text_poke() the variable that assigns the
expected for debugging was dropped. Unfortunately, I noticed this when I
needed it. Add it back.
Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home Cc: "x86@kernel.org" <x86@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: cdfebad7bc9b ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:
Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.
So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.
[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]
When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.
Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com Fixes: d5638e660741 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up. Fix this by accepting RSCN when
the device is not online.
Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com Fixes: d5638e660741 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Clear wait for mailbox interrupt flag to prevent stale mailbox:
Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...
To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.
A direct attach tape device, when gets swapped with another, was not
discovered. Fix this by looking at loop map and reinitialize link if there
are devices present.
Case (1):
The only waiter on wka_port->completion_wq is zfcp_fc_wka_port_get()
trying to open a WKA port. As such it should only be woken up by WKA port
*open* responses, not by WKA port close responses.
Case (2):
A close WKA port response coming in just after having sent a new open WKA
port request and before blocking for the open response with wait_event()
in zfcp_fc_wka_port_get() erroneously renders the wait_event a NOP
because the close handler overwrites wka_port->status. Hence the
wait_event condition is erroneously true and it does not enter blocking
state.
With non-negligible probability, the following time space sequence happens
depending on timing without this fix:
So we erroneously end up with no automatic port scan. This is a big problem
when it happens during boot. The timing is influenced by v3.19 commit 46f4d10334e9 ("zfcp: auto port scan resiliency").
Fix it by fully mutually excluding zfcp_fc_wka_port_get() and
zfcp_fc_wka_port_offline(). For that to work, we make the latter block
until we got the response for a close WKA port. In order not to penalize
the system workqueue, we move wka_port->work to our own adapter workqueue.
Note that before v2.6.30 commit da8de36448e2 ("[SCSI] zfcp: Set WKA-port to
offline on adapter deactivation"), zfcp did block in
zfcp_fc_wka_port_offline() as well, but with a different condition.
While at it, make non-functional cleanups to improve code reading in
zfcp_fc_wka_port_get(). If we cannot send the WKA port open request, don't
rely on the subsequent wait_event condition to immediately let this case
pass without blocking. Also don't want to rely on the additional condition
handling the refcount to be skipped just to finally return with -EIO.
Link: https://lore.kernel.org/r/20220729162529.1620730-1-maier@linux.ibm.com Fixes: 9300c0db8f59 ("[SCSI] zfcp: attach and release SAN nameserver port on demand") Cc: <stable@vger.kernel.org> #v2.6.28+ Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After ufshcd_wl_shutdown() set device power off and link off,
ufshcd_shutdown() could turn off clock/power. Also remove
pm_runtime_get_sync.
The reason why it is safe to remove pm_runtime_get_sync() is because:
- ufshcd_wl_shutdown() -> pm_runtime_get_sync() will resume hba->dev too.
- device resume(turn on clk/power) is not required, even if device is in
RPM_SUSPENDED.
Link: https://lore.kernel.org/r/20220727030526.31022-1-peter.wang@mediatek.com Fixes: 1799d66d7dd4 ("scsi: ufs: core: Enable power management for wlun") Cc: <stable@vger.kernel.org> # 5.15.x Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the function s3fb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
In the function arkfb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
In the function vt8623fb_set_par(), the value of 'screen_size' is
calculated by the user input. If the user provides the improper value,
the value of 'screen_size' may larger than 'info->screen_size', which
may cause the following bug:
If a file has FI_COMPRESS_RELEASED, all writes for it should not be
allowed. However, as of now, in case of compress_mode=user, writes
triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly,
which could crash that file.
To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already
has FI_COMPRESS_RELEASED flag.
Since commit 7c1c191e44c1 ("f2fs: let's allow compression for mmap files"),
it has been allowed to compress mmap files. However, in compress_mode=user,
it is not allowed yet. To keep the same concept in both compress_modes,
f2fs_ioc_(de)compress_file() should also allow it.
Let's remove checking mmap files in f2fs_ioc_(de)compress_file() so that
the compression for mmap files is also allowed in compress_mode=user.
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With CONFIG_PREEMPTION disabled, arch/x86/entry/thunk_$(BITS).o becomes
an empty object file.
With some old versions of binutils (i.e., 2.35.90.20210113-1ubuntu1) the
GNU assembler doesn't generate a symbol table for empty object files and
objtool fails with the following error when a valid symbol table cannot
be found:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table
To prevent this from happening, build thunk_$(BITS).o only if
CONFIG_PREEMPTION is enabled.
Commit 9c1854c40af7 ("sched/core: Optimize ttwu() spinning on p->on_cpu")
optimises ttwu by queueing a task that is descheduling on the wakelist,
but does not check if the task descheduling is still allowed to run on that CPU.
In this warning, the problematic task is a workqueue rescue thread which
checks if the rescue is for a per-cpu workqueue and running on the wrong CPU.
While this is early in boot and it should be possible to create workers,
the rescue thread may still used if the MAYDAY_INITIAL_TIMEOUT is reached
or MAYDAY_INTERVAL and on a sufficiently large machine, the rescue
thread is being used frequently.
Tracing confirmed that the task should have migrated properly using the
stopper thread to handle the migration. However, a parallel wakeup from udev
running on another CPU that does not share CPU cache observes p->on_cpu and
uses task_cpu(p), queues the task on the old CPU and triggers the warning.
Check that the wakee task that is descheduling is still allowed to run
on its current CPU and if not, wait for the descheduling to complete
and select an allowed CPU.
Wakelist can help avoid cache bouncing and offload the overhead of waker
cpu. So far, using wakelist within the same llc only happens on
WF_ON_CPU, and this limitation could be removed to further improve
wakeup performance.
The commit 531eac291491 ("sched: Only queue remote wakeups when
crossing cache boundaries") disabled queuing tasks on wakelist when
the cpus share llc. This is because, at that time, the scheduler must
send IPIs to do ttwu_queue_wakelist. Nowadays, ttwu_queue_wakelist also
supports TIF_POLLING, so this is not a problem now when the wakee cpu is
in idle polling.
Benefits:
Queuing the task on idle cpu can help improving performance on waker cpu
and utilization on wakee cpu, and further improve locality because
the wakee cpu can handle its own rq. This patch helps improving rt on
our real java workloads where wakeup happens frequently.
Consider the normal condition (CPU0 and CPU1 share same llc)
Before this patch:
We see CPU0 can finish its work earlier. It only needs to put task to
wakelist and return.
While CPU1 is idle, so let itself handle its own runqueue data.
This patch brings no difference about IPI.
This patch only takes effect when the wakee cpu is:
1) idle polling
2) idle not polling
For 1), there will be no IPI with or without this patch.
For 2), there will always be an IPI before or after this patch.
Before this patch: waker cpu will enqueue task and check preempt. Since
"idle" will be sure to be preempted, waker cpu must send a resched IPI.
After this patch: waker cpu will put the task to the wakelist of wakee
cpu, and send an IPI.
Benchmark:
We've tested schbench, unixbench, and hachbench on both x86 and arm64.
On x86 (Intel Xeon Platinum 8269CY):
schbench -m 2 -t 8
hackbench -g 1 -l 100000
before after
Time 4.217 2.916
Our patch has improvement on schbench, hackbench
and Pipe-based Context Switching of unixbench
when there exists idle cpus,
and no obvious regression on other tests of unixbench.
This can help improve rt in scenes where wakeup happens frequently.
The commit 5378097a078d ("sched/core: Offload wakee task activation if it
the wakee is descheduling") checked rq->nr_running <= 1 to avoid task
stacking when WF_ON_CPU.
Per the ordering of writes to p->on_rq and p->on_cpu, observing p->on_cpu
(WF_ON_CPU) in ttwu_queue_cond() implies !p->on_rq, IOW p has gone through
the deactivate_task() in __schedule(), thus p has been accounted out of
rq->nr_running. As such, the task being the only runnable task on the rq
implies reading rq->nr_running == 0 at that point.
A build with -D_FORTIFY_SOURCE=2 enabled will produce the following warnings:
sysfs.c:63:30: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 255 [-Wformat-truncation=]
snprintf(filepath, 256, "%s/%s", path, filename);
^~
Bump up the buffer to PATH_MAX which is the limit and account for all of
the possible NUL and separators that could lead to exceeding the
allocated buffer sizes.
Fixes: e8658e3e2d63 ("tools/thermal: Introduce tmon, a tool for thermal subsystem") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since the user can control the arguments of the ioctl() from the user
space, under special arguments that may result in a divide-by-zero bug
in:
drivers/video/fbdev/arkfb.c:784: ark_set_pixclock(info, (hdiv * info->var.pixclock) / hmul);
with hdiv=1, pixclock=1 and hmul=2 you end up with (1*1)/2 = (int) 0.
and then in:
drivers/video/fbdev/arkfb.c:504: rv = dac_set_freq(par->dac, 0, 1000000000 / pixclock);
we'll get a division-by-zero.
arch/x86/mm/numa.c: In function ‘cpumask_of_node’:
arch/x86/mm/numa.c:916:39: warning: the comparison will always evaluate as ‘false’ for the address of ‘node_to_cpumask_map’ will never be NULL [-Waddress]
916 | if (node_to_cpumask_map[node] == NULL) {
| ^~
node_to_cpumask_map is of type cpumask_var_t[].
When CONFIG_CPUMASK_OFFSTACK is set, cpumask_var_t is typedef'd to a
pointer for dynamic allocation, else to an array of one element. The
"wicked game" can be checked on line 700 of include/linux/cpumask.h.
The original code in debug_cpumask_set_cpu() and cpumask_of_node() were
probably written by the original authors with CONFIG_CPUMASK_OFFSTACK=y
(i.e. dynamic allocation) in mind, checking if the cpumask was available
via a direct NULL check.
When CONFIG_CPUMASK_OFFSTACK is not set, GCC gives the above warning
while compiling the kernel.
Fix that by using cpumask_available(), which does the NULL check when
CONFIG_CPUMASK_OFFSTACK is set, otherwise returns true. Use it wherever
such checks are made.
Conditional definitions of cpumask_available() can be found along with
the definition of cpumask_var_t. Check the cpumask.h reference mentioned
above.
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:
Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.
Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value >= nr_cpu_ids.
Fixes: d7c5c8a758f2 ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Both functions are doing almost the same, that is checking if admission
control is still respected.
With exclusive cpusets, dl_task_can_attach() checks if the destination
cpuset (i.e. its root domain) has enough CPU capacity to accommodate the
task.
dl_cpu_busy() checks if there is enough CPU capacity in the cpuset in
case the CPU is hot-plugged out.
dl_task_can_attach() is used to check if a task can be admitted while
dl_cpu_busy() is used to check if a CPU can be hotplugged out.
Make dl_cpu_busy() able to deal with a task and use it instead of
dl_task_can_attach() in task_can_attach().
Since commit ec79799c5a22 ("faddr2line: Fix overlapping text section
failures, the sequel"), faddr2line is completely broken on arm64.
For some reason, on arm64, the vmlinux ELF object file type is ET_DYN
rather than ET_EXEC. Check for both when determining whether the object
is vmlinux.
Modules and vmlinux.o have type ET_REL on all arches.
Fixes: ec79799c5a22 ("faddr2line: Fix overlapping text section failures, the sequel") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/dad1999737471b06d6188ce4cdb11329aa41682c.1658426357.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The recent change to the PHB numbering logic has a logic error in the
handling of "ibm,opal-phbid".
When an "ibm,opal-phbid" property is present, &prop is written to and
ret is set to zero.
The following call to of_alias_get_id() is skipped because ret == 0.
But then the if (ret >= 0) is true, and the body of that if statement
sets prop = ret which throws away the value that was just read from
"ibm,opal-phbid".
Fix the logic by only doing the ret >= 0 check in the of_alias_get_id()
case.
Fixes: b614cf23313a ("powerpc/pci: Prefer PCI domain assignment via DT 'linux,pci-domain' and alias") Reviewed-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220802105723.1055178-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
It's possible that this kernel has been kexec'd from a kernel that
enabled bus lock detection, or (hypothetically) BIOS/firmware has set
DEBUGCTLMSR_BUS_LOCK_DETECT.
Disable bus lock detection explicitly if not wanted.
Fixes: b817401736f3 ("x86/traps: Handle #DB for bus lock") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
kernel_text_address() treats ftrace_trampoline, kprobe_insn_slot
and bpf_text_address as valid kprobe addresses - which is not ideal.
These text areas are removable and changeable without any notification
to kprobes, and probing on them can trigger unexpected behavior:
https://lkml.org/lkml/2022/7/26/1148
Considering that jump_label and static_call text are already
forbiden to probe, kernel_text_address() should be replaced with
core_kernel_text() and is_module_text_address() to check other text
areas which are unsafe to kprobe.
[ mingo: Rewrote the changelog. ]
Fixes: 83b8b035c5dc ("kprobes, extable: Identify kprobes trampolines as kernel text area") Fixes: c2d5f4374f26 ("bpf: make jited programs visible in traces") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20220801033719.228248-1-chenzhongjin@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
With the patch (c436ad9f4a2a0926 ("perf symbol: Correct address for bss
symbols")) I see lots of:
dso__load_sym_internal: failed to find program header for symbol:
Lorg/apache/fop/fo/FObj;bind(Lorg/apache/fop/fo/PropertyList;)V
st_value: 0x40
Fixes: c436ad9f4a2a0926 ("perf symbol: Correct address for bss symbols") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220731164923.691193-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
of_get_next_parent() returns a node pointer with refcount incremented,
we should use of_node_put() on it when not need anymore.
Add missing of_node_put() in the error path to avoid refcount leak.
Fixes: 532ae9ce60f7 ("[CELL] add support for MSI on Axon-based Cell systems") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220605065129.63906-1-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>